You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Has anyone ever looked into Markov categories that are also traced monoidal, describing steady states of Markov chains?
This seems to me like a Thing That Ought To Be Done
@Mateo Torres-Ruiz and I were just wondering about this! A first (unsatisfactory) answer is that you can embed the category of stochastic maps into that of linear maps with the tensor as monoidal product, which is traced, at least in the finite-dimensional case---I'll let people that are knowledgeable in functional analysis tell us how to generalise this to infinite dimensions. But I suppose the (partial) trace of a stochastic map might not be stochastic, which is why I said that was unsatisfactory.
And this might not be the sort of trace you want if steady states of Markov chains is what you're studying. You'd prefer some additive (particle-style in GoI terms?) trace, computed as the sum of iterated powers of your stochastic matrix.
Hmm, I'm pretty skeptical of traced Markov categories, since wouldn't a traced SMC whose monoidal unit is terminal be highly degenerate in some way? Like being the terminal category (up to equivalence)? I can't prove it right now, but this is probably what happens based on the intuition that the trace disconnects (towards the codomain) if the monoidal unit is terminal. (corrected below)
So maybe you rather want an abstract version of the category of substochastic matrices, similar to what Robin suggests. This looks a lot like you may want a traced gs-monoidal (2-)categories, though I'm not very familiar with these.
In either case I'm not aware of anyone having looked into it. With @Paolo Perrone and Sean Moss we have some preliminary results towards a theory of stochastic dynamical systems, but these go more in the direction of ergodic decomposition and ergodic theorems. But of course there may be things that I'm not aware of.
Tobias Fritz said:
Hmm, I'm pretty skeptical of traced Markov categories, since wouldn't a traced SMC whose monoidal unit is terminal be highly degenerate in some way?
Hmm... I don't think so. There are plenty of traced cartesian categories around (e.g. relations with the disjoint sum as monoidal product, which is a biproduct in this case) and their monoidal unit is terminal.
Ah, right, thanks! I knew that at some point, but apparently lost trace of it.
I'm pretty sure you can't have a category that's both compact closed and Markov without significant degeneracy (probably a preorder).
But traced should be ok.
Robin Piedeleu said:
I'm pretty sure you can't have a category that's both compact closed and Markov without significant degeneracy (probably a preorder).
Just equivalent to the terminal category: if is compact closed and semicartesian, then every hom-set is a singleton.
Yeah, both traced cartesian and traced cocartesian not only exist but they're historically really important in program semantics
I think for example [sets, partial functions] can be equipped with traces for both its cartesian and cocartesian products
Well then, I agree that e.g. studying traces on FinStoch looks like a Thing That Ought To Be Done :sweat_smile: Assuming that it hasn't been done already.
Do you think that Hasegawa's equivalence between traces and fixed point operators still holds for Markov categories? In other words, does his proof require the full universal property of products, or are weaker properties of copying and discarding enough?
Tobias Fritz said:
Just equivalent to the terminal category: if is compact closed and semicartesian, then every hom-set is a singleton.
Incidentally this also implies that you can't have both compact closed + cartesian, and it's way simpler than the proof sketch I knew for that: namely, note that the unit is -separable (as all cartesian states are), then use the yanking equation to prove that identities are -separable, and then use that to contradict the identity axioms of a category
Tobias Fritz said:
Do you think that Hasagawa's equivalence between traces and fixed point operators still holds for Markov categories? In other words, does his proof require the full universal property of products, or are weaker properties of copying and discarding enough?
Good question... without looking I'd guess it most likely pushes a morphism through copying somewhere, but it's probably worth checking
BTW this sounds like something that Prakash Panangaden must have thought about. I see something mentioned in Section 5.4 of Labelled Markov Processes, but don't have time now to look into further.
if it is true that markov categories are (symmetric?) monoidal with a commutative comonoid on each object, then such categories were studied in the 80s, in bialgebra cohomology, with partial traces in the representations. eg gerstenhaber had papers about that. even bob pare a couple, whom category theorists should know. see the references of tom fox's paper in JPAA where he shows that the category of coalgebras is cofree cartesian over monoidal. it's a fairly well-known paper. in the CS, categories with a comonoid on every object arose maybe in the early 90s in concurrency, from milner's action calculi. i think john power wrote some papers with milner about that...
what i don't understand is what andrej markov has to do with this structure :)
i am not asking in a negative way. i really don't understand. the fact that some markov chain phenomena involve copying and deleting does not mean that they are an essential feature. if they were, then we should be able to construct markov chains whenever the markov category structure is available. isn't any cartesian category a markov category? but the fact that the morphisms preserve copying makes everything deterministic. shouldn't we be able to present at least the unbiased coin in every markov category?
Markov categories are supposed to axiomatise categories whose morphisms are some kind of Markov kernel..... deterministic functions are ``Markov kernels for 2-valued probability''
2-valued probability is a bit of a ghost, isn't it? a 2-valued probability measure on a powerset is an ultrafilter. andrej markov did study pushkin's poetry, but it takes a lot of poetry to see a markov chain in an ultrafilter.
I think the ultimate answer is that quite a lot of probability theory can be done in a generic Markov category, although obviously not all of it. It seems like a sweet spot between how complicated the axioms are and how much you can get out of them
dusko said:
if it is true that markov categories are (symmetric?) monoidal with a commutative comonoid on each object, then such categories were studied in the 80s, in bialgebra cohomology, with partial traces in the representations. eg gerstenhaber had papers about that. even bob pare a couple,
Interesting! The earliest references I know of so far are from the 90s from the term graph community, such as Gadducci's thesis. Do you have some concrete pointers to papers from the 80s? Looking at Bialgebra cohomology, deformations, and quantum groups by Gerstenhaber and Schack, I don't see mention of any general class of categories in there, let alone SMCs with a commutative comonoid on each object. I also can't find anything by Bob Paré on bialgebra cohomology.
i am not asking in a negative way. i really don't understand. the fact that some markov chain phenomena involve copying and deleting does not mean that they are an essential feature. if they were, then we should be able to construct markov chains whenever the markov category structure is available. isn't any cartesian category a markov category? but the fact that the morphisms preserve copying makes everything deterministic.
To add a bit to what Jules already said, I think that it is indeed quite surprising that copying and deleting are an essential feature of probability, and this is arguably why the topic has appeared only in the past few years. Here's a rough sketch of how to arrive at this structure from elementary probability.
First, note that product distributions, or equivalently stochastic independence play a crucial role in probability. Therefore it's sensible to axiomatize Markov kernels as an SMC. But for many statements of probability, this is not enough! Consider for example the equation , which one may take as the definition of the conditional probability . This equation has two occurrences of on the right-hand side, implying that it cannot be translated into pure SMC theory. This is where the comonoid structures come to the rescue.
On the other hand, one clearly cannot expect to have a cartesian monoidal category, since a joint distribution of two random variables is not determined by the individual distributions of either. In a sense, this is what probability theory is all about. Therefore one cannot expect the comultiplications to be natural. On the other hand, product projections are still natural transformations.
So we're arriving at the concept of SMCs with commutative comonoid structures, compatible with the monoidal structure, and such that projections are still natural. And that's (one formulation of) the definition of Markov category.
shouldn't we be able to present at least the unbiased coin in every markov category?
I would say no more or less than we "should" be able to present e.g. Cartesian coordinates in every cartesian category.
thank you for the nice explanation. i am quite impressed and curious about the way you and i guess lots of other people have been developing probability in categories. but since it seems to be a real movement, maybe it could escape the usual workflow of Working Mathematics for Category Theorists, and make itself really useful to real practitioners? categorical tools are genuinely useful, but the friendly reception within our own little community often prevents us from taking up the real challenges.
so let me try to be critical for a minute, not because i am critical, but just in case that it might be useful.
someone said that that the title of the project is "synthetic probability theory". if you look at synthetic differential geometry, or even hilbert's synthetic geometry, the idea is to supply a set of axioms sufficient to completely derive the essential parts of the theory, without borrowing ready-made parts from it. recognizing some necessary properties is nice, but if these properties are shared with confusingly different theories, we may confuse ourselves. in any case, finding tighter axioms prevents such confusions. (eg komogorov spent a great part of his life trying to provide a synthetic reconstruction of probability theory, better the one from his youth in that it would also engender a theory of information and statistics. the resulting theory of randomness is running strong, but all agree that it is in need of a language...)
Tobias Fritz said:
dusko said:
if it is true that markov categories are (symmetric?) monoidal with a commutative comonoid on each object, then such categories were studied in the 80s, in bialgebra cohomology, with partial traces in the representations. eg gerstenhaber had papers about that. even bob pare a couple,
Interesting! The earliest references I know of so far are from the 90s from the term graph community, such as Gadducci's thesis. Do you have some concrete pointers to papers from the 80s? Looking at Bialgebra cohomology, deformations, and quantum groups by Gerstenhaber and Schack, I don't see mention of any general class of categories in there, let alone SMCs with a commutative comonoid on each object. I also can't find anything by Bob Paré on bialgebra cohomology.
comonoids and the explanations of their meaning are there in the papers that you mention, and bob pare's paper is surely cited in tom fox's. but people were not so interested in categories as such, but in problems that get solved using them. if you want explicit analysis of comonoids as a categorical structure, then that would bring us from papers solving math problems to papers analyzing categorical structures. i am afraid that i cannot avoid mentioning
https://www.cambridge.org/core/journals/mathematical-structures-in-computer-science/article/abs/categorical-logic-of-names-and-abstraction-in-action-calculi/7C1B12F9B6EE6994E4EDB4CC88DE0E2C
maybe there are more references there. the whole thing goes back to lambek's observation that a comonoid, ie the comonad induced by it, ie the kleisli category of that comonad, is the categorical verson of freely adjoining a variable. an algebraic construct. plotkin also used such things in some of his lics papers, i think.
First, note that product distributions, or equivalently stochastic independence play a crucial role in probability. Therefore it's sensible to axiomatize Markov kernels as an SMC. But for many statements of probability, this is not enough! Consider for example the equation , which one may take as the definition of the conditional probability . This equation has two occurrences of on the right-hand side, implying that it cannot be translated into pure SMC theory. This is where the comonoid structures come to the rescue.
ah but the algebraic manipulations of the occurrences of and is what made people call them the random variables. a variable is a thing that can be copied and deleted. the fact that it needs to be copied when we reason about conditional probability is not a property of conditional probability, but a feature of algebra of random variables. edward nelson reconstructed the entire "radically elementary probability theory" from that. but lambek also switched between variables and comonoid/comonads in all kinds of contexts, from rings and modules to deductive systems and cartesian categories. (see either of his books on these subjects.)
bob coecke and i used the comonoid structures to recognize classical states in quantum universes because the comonoid structure allows copying and deletion, which the quantum states do not permit:
https://arxiv.org/abs/0904.1997
uh sorry i don't mean to spam the world by advertising my work but it occurs to me now that in that paper we distinguished not only classical/quantum processes using the comonoids, but also stochastic, processes, precisely with your totality requirement. this didn't occur to me until now. but that was just peeling peaces of structure that happen to be there, with no hope to really characterize stochasticity in such terms...
shouldn't we be able to present at least the unbiased coin in every markov category?
I would say no more or less than we "should" be able to present e.g. Cartesian coordinates in every cartesian category.
well, the cartesian coordinates got descartes' name because he started denoting abstract points by tuples of variables. the cartesian categories got descartes name from cartesian products, whose elements are the tuples that can be referred to by variables. the elements of a tensor product cannot be denoted by such coordinates. the way cartesian structure allows manipulating abstract cartesian coordinates was worked out in early semantics of computation, cf lambek-scott's book...
sorry about the flood of words. it is because i really like what you guys have been doing. it would be even better if you would do more :)
QUESTION: take an arbitrary commutative monad . Let be the slice over of the Kleisli category . the cartesian structure of induces comonoids a monoidal structure on , with a comonoid on every object , whose counit is the terminal arrow . This is a Markov category, isn't it? If that structure supports probability theory, we have a central limit theorem, a family of poisson processes, a law of small numbers. Can we really reason probabilistically under any commutative monad on ?
Jules Hedges said:
I think for example [sets, partial functions] can be equipped with traces for both its cartesian and cocartesian products
The category of sets and partial functions, which I'll call , with tensor = coproduct is indeed a traced (cocartesian) monoidal category. The trace is described on page 3 of https://www.mathstat.dal.ca/mfps2018/preproc/paper_8.pdf (in that paper they also talk about partially defined trace operators)
On the other hand, I don't know of a trace for with respect to the Cartesian product (which is worth mentioning that it is not a categorical product but rather a restriction product in the sense of restriction categories https://arxiv.org/abs/math/0610500).
Robin Piedeleu said:
I'm pretty sure you can't have a category that's both compact closed and Markov without significant degeneracy (probably a preorder).
Out of curiosity: what about Markov and *-autonomous?
Tobias Fritz said:
Do you think that Hasegawa's equivalence between traces and fixed point operators still holds for Markov categories? In other words, does his proof require the full universal property of products, or are weaker properties of copying and discarding enough?
Well I'm currently Hasegawa's postdoc and we're working on traced monoidal categories stuff. I could ask him what he thinks!
Thanks for the kind words @dusko ! It's always interesting to hear your thoughts, given your unusual depth of insight and breadth of knowledge.
dusko said:
i am quite impressed and curious about the way you and i guess lots of other people have been developing probability in categories. but since it seems to be a real movement, maybe it could escape the usual workflow of Working Mathematics for Category Theorists, and make itself really useful to real practitioners? categorical tools are genuinely useful, but the friendly reception within our own little community often prevents us from taking up the real challenges.
Right, completely agreed! I guess everyone has their own attitude and plans in this respect, so I can only speak for myself: I'm personally interested mainly in fundamental things, or in other words I'm more of a philosopher than a practitioner. So I won't take up the real challenges myself. But fortunately there are many others here who work on real challenges with ACT. With Markov categories specifically, for example the people at Planting.Space seem to find the theory valuable and use it at the theoretical end of things.
Edit: if traditional probability theorists also count as "practitioners", then I'm even personally interested and involved in making the formalism useful to them. For example, our paper on de Finetti has appeared in a stochastic analysis journal.
someone said that that the title of the project is "synthetic probability theory". if you look at synthetic differential geometry, or even hilbert's synthetic geometry, the idea is to supply a set of axioms sufficient to completely derive the essential parts of the theory, without borrowing ready-made parts from it. recognizing some necessary properties is nice, but if these properties are shared with confusingly different theories, we may confuse ourselves. in any case, finding tighter axioms prevents such confusions.
Indeed. For now we may still be quite far from a complete set of axioms, but that is one of the big goals of the theory. I like to compare it with abstract algebra as a synthetic theory of number systems: we can characterize as the unique complete Archimedean ordered field, but there's also a lot of other stuff in field theory and commutative algebra that's interesting, and plenty of theorems that apply at various levels of generality. So for me, synthetic probability is more of a (partially ordered) hierarchy of theories; let me say a bit more on that below. On the other hand, there's also Alex Simpson's synthetic probability theory, which is not directly related to Markov categories and intended to be complete (at least morally).
On the references discussion:
comonoids and the explanations of their meaning are there in the papers that you mention, and bob pare's paper is surely cited in tom fox's.
Sorry about being dense, but I'm still not sure what you're referring to. Is it Thomas Fox, Coalgebras and cartesian categories? That is from 1976 (and not JPAA), so it can't cite bialgebra cohomology papers from the 80s, and there's no citation to Paré either. So maybe you have a different Tom Fox paper in mind? Do you have a link?
if you want explicit analysis of comonoids as a categorical structure, then that would bring us from papers solving math problems to papers analyzing categorical structures. i am afraid that i cannot avoid mentioning
https://www.cambridge.org/core/journals/mathematical-structures-in-computer-science/article/abs/categorical-logic-of-names-and-abstraction-in-action-calculi/7C1B12F9B6EE6994E4EDB4CC88DE0E2C
Interesting! The definition that you've given there,
image.png
looks somewhat related to the definition of Markov category. Now I don't know anything about action calculi, which makes the paper a bit difficult for me to follow, and I haven't really grasped the admissibility condition yet. So let me ask like here: Your definition doesn't seem to require that the comonoid structure on a tensor product must be the tensor product comonoid structure, although this is a natural and important condition. Or can this be obtained from your admissibility somehow? If so, for what choice of ?
ah but the algebraic manipulations of the occurrences of and is what made people call them the random variables. a variable is a thing that can be copied and deleted. the fact that it needs to be copied when we reason about conditional probability is not a property of conditional probability,
Yes, it is a property of conditional probability or conditioning, in the sense that copying of variables is necessary to express it. Not so for other concepts in probability, such as independence, where we can simply write and no copying of variables occurs.
but a feature of algebra of random variables.
Sure, but I don't see that as being substantially different, since the multiplication in an algebra is just the Heisenberg picture or predicate transformer version of the duplication of variables. It's the comonoid structures that are important at the synthetic level, and this can manifest itself semantically either as an actualy copying or diagonal map, or as the formal dual of a multiplication or monoid structure.
bob coecke and i used the comonoid structures to recognize classical states in quantum universes because the comonoid structure allows copying and deletion, which the quantum states do not permit:
https://arxiv.org/abs/0904.1997
Right. However, the Frobenius structures used in that paper and others by Bob are actually not suitable for measure-theoretic probability, since there it's really only the comonoid structures that exist, but no multiplications and no units. And surely any formalism laying claim to words like "synthetic probability theory" must have a way of accounting for measure theory (even if that is by arguing it away).
I've been simplifying a little bit, since one can at least have multiplications also in measure-theoretic probability (resulting in non-unital Frobenius structures) if one is willing to drop normalization of probability, but for synthetic probability this seems to be the less useful approach (and I could elaborate if anyone wants more details).
dusko said:
QUESTION: take an arbitrary commutative monad . Let be the slice over of the Kleisli category . the cartesian structure of induces comonoids a monoidal structure on , with a comonoid on every object , whose counit is the terminal arrow . This is a Markov category, isn't it?
Not quite, since the comultiplications will generally not be morphisms in the slice category. But if you restrict yourself to those objects for which this is the case, then yes, you get a Markov category. The most relevant case of this construction is when ( is affine), since then it's just the Kleisli category itself that is a Markov category, and this works whenever lives on a cartesian category. We call Markov categories that arise like this (in a nice way) representable.
If that structure supports probability theory, we have a central limit theorem, a family of poisson processes, a law of small numbers. Can we really reason probabilistically under any commutative monad on ?
No, most constructions and theorems of probability theory will require additional hypotheses to be applicable. For example, not every Markov category has conditionals, and the existence of conditionals is a crucial hypothesis in our proof of the de Finetti theorem. In fact, the only nontrivial Markov categories that we know of and satisfy all the relevant hypotheses are those of measure-theoretic probability. For zero-one laws, one can get by with weaker hypotheses which make those results also apply to suitable categories of relations. So here we have a hierarchy of relevant assumptions, and I think of it as vaguely analogous to assumptions one makes on rings and fields in algebra, all the way from general rings to (for example) complete Archimedean ordered fields.
The central limit theorem and things like that are big words, and getting a good categorical treatment of more quantitative things like that is very challenging. We have some preliminary results on the law of large numbers and some basics of ergodic theory with Markov categories, including a version of ergodic decomposition. But these are again qualitative things that are easier to do.
What I gather from your comments overall is a certain skepticism towards using the term "synthetic probability theory" for what we're doing. That's a valid criticism, and we should perhaps be careful with using that term at the present stage where we can only reproduce a small number of qualitative results from probability and statistics, and we don't yet have a detailed idea of how to approach more quantitative theorems.
Jules Hedges said:
Tobias Fritz said:
Do you think that Hasagawa's equivalence between traces and fixed point operators still holds for Markov categories? In other words, does his proof require the full universal property of products, or are weaker properties of copying and discarding enough?
Good question... without looking I'd guess it most likely pushes a morphism through copying somewhere, but it's probably worth checking
So I asked Hassei (Hasegawa), and he says that yes naturality of the copy map is essential for the equivalence. That said, he says that in some cases, you can still obtain a fixed point operator from a trace even if the copy map is unnatural.
So for traced Markov categories, it does not seem like the trace would equivalent to a sort of fixed point operator.
I can only speak for myself as a "consumer" of categorical probability theory - I've found that Markov categories are a really nice setting for open games. You need a supply of comonoids for intuitive game-theoretic reasons, for example because some move needs to be known by both subsequent players and the payoff function. And the axiom turns out to be useful for technical reasons, it makes some boring complications that appear in the general monoidal case go away
Whether this counts as "practical" depends on what you think about the open games project, but we're trying (although the practical end is all code, and we've never attempted to write polymorphic code in Markov categories terms, they stay in back-of-shop)
JS Pacaud Lemay (he/him) said:
So for traced Markov categories, it does not seem like the trace would equivalent to a sort of fixed point operator.
I also twote the question about traced Markov categories, and Davidad replied pointing out that Markov chains don't have unique steady states, so you'd have to do pick a canonical one like "trace = maximum entropy steady state" (for example, so the trace of the identity map would give a uniform distribution). So I'd vaguely conjecture that this gives an example of a traced Markov category
So how exactly would you define the trace of a general stochastic map between finite sets? I think that it'll be difficult to get the naturality in to work.
I guess the general idea is , where is a steady state of . If it fails to be natural then I guess the idea just doesn't work
Hi, I think traces/recursion are interesting and we’ve thought about it a bit. For a simple example we might consider the trace of the map
given by
where is an object of infinite sequences of booleans, and (cons) adds something to the front of an infinite sequence.
The trace gives us a map , i.e. a random infinite sequence.
The “naturality" axiom of traced monoidal categories says that you can pull the Bernoulli to the outside. So this random sequence could be the random choice between an infinite stream of heads and the infinite stream of tails, and it shouldn't be an infinite stream of coin flips.
We made a Haskell library, lazyppl, to experiment with categorical probability. And we explored various kinds of recursion there… I can define a trace operator and define this random sequence using it. (We do also have infinite streams of coin flips in lazyppl.) The underlying category I have in mind is pointed wcpos or wQBSs, so when I wrote above I really meant . A connection between traces and "mfix" is in the traced premonoidal categories paper.
Tobias Fritz said:
I've been simplifying a little bit, since one can at least have multiplications also in measure-theoretic probability (resulting in non-unital Frobenius structures) if one is willing to drop normalization of probability, but for synthetic probability this seems to be the less useful approach (and I could elaborate if anyone wants more details).
I'm curious to hear more about this. Is it possible to drop normalization, but keep a requirement of sub-normalization? It always struck me that Frobenius multiplication gives a "conditioning" operation, and so "ought to" fit naturally into a categorical account of probability theory, somehow.
In some cases, the ability to form cycles is responsible for the loss of normalization, and I've wondered if moving to *-autonomous categories might be an improvement in this case (restricting how one can compose diagrams in order to maintain (sub-)normalization), and indeed I believe one can still form a reasonable generalized definition of "Frobenius algebra on every object" in this setting. I'd be curious to hear any thoughts you might have around any of this!
Chris Barrett said:
I'm curious to hear more about this. Is it possible to drop normalization, but keep a requirement of sub-normalization? It always struck me that Frobenius multiplication gives a "conditioning" operation, and so "ought to" fit naturally into a categorical account of probability theory, somehow.
That is indeed intuitive, but the main trouble with the idea is in measure-theoretic probability, where individual outcomes all typically have probability zero. This runs into a couple of related issues, one being that for continuous variables, the subnormalized version of conditioning typically gives identically , while the usual "normalized" conditioning still makes sense and is useful (see regular conditional probability). Another is that Frobenius monoids are typically finite(dimensional) or compact in some way, and it's difficult to imagine a definition of morphism that would achieve that for the kinds of structures that appear in measure theory (or even potential future replacements of measure theory). A third point is that having normalization is really important for distributions on infinite products, or equivalently situations with infinitely many random variables. If you have infinitely many independent coin flips, and each of them is merely subnormalized, then the total normalization would again be zero.
Perhaps one could get around all of this e.g. by allowing infinitesimal probabilities, so my point is just to indicate that it's difficult to deal with infinite things like this in a subnormalization setting. And presumably we still want to be able to model situations with infinitely many continuous variables, at least as convenient idealizations.
Perhaps more importantly, it also turns out that there is no need for anything more than a comonoid structure, since the latter is already sufficiently expressive for a lot of things (like conditionals), and I can't think of any general construction or theorem in probability where one would require something like a Frobenius structure in order to state it. So even if one could use Frobenius structures, one can argue that it would be overkill.
As for -autonomous categories, I don't see why one would expect those to be relevant. What would be a candidate object that we would expect to not be isomorphic to its own dual? (If I understand correctly, if all objects are self-dual, then we're back in the compact closed case, right?) For example, we would want to have an object that corresponds to the space of outcomes of countably many coin flips, something like . Would you expect this object to be self-dual or no? If not, how should I think about its dual?
Going into good solid scientific wild speculation, my first thought was that if is "the type of random processes/random variables of type " then ought to be "the type of events/measurements of type " (although we can expect to run into the same normalisation issues, probably)
Thank you for the response! It seems that -- as one might expect -- working in infinite dimensions causes issues for normalization and in particular for the Frobenius structure. This sounds a familiar issue in, e.g., categorical quantum mechanics -- although, there they do need the extra structure, whereas you're right to point out that you don't seem to!
I will quickly add that I'm not an expert in this area, especially in the functional analysis/measure theory necessary to work properly with infinite dimensions. I come from a program semantics background, but I've been following developments in ACT keenly. One motivation I have to consider Frobenius multiplication is the following: my upcoming PhD thesis devlops a variant of the lambda calculus which (among other remarkable things) constitutes a term language for string diagrams, characterizing (by various first-order fragments) the free Cartesian category, SMC, Markov category, Hypergraph category and the free CDU category. Here, Frobenius multiplication (in the form of equality checking), surprisingly, appears to arise very naturally in the syntax, seeming to suggest to me (if you take the language seriously) that this is a natural structure for computing with, (for example, implementing exact conditioning in a PPL). This leads me to want such a structure in my categorical semantics as well. However, I have not developed any of this with infinite dimensional semantics in mind, and there no doubt would be problems arising in doing so (not least, the ones you mention). Sorry if some of this is a bit cryptic: at some point I hope to write all this up properly for discussion here, but I don't want to derail this thread.
Regarding the *-autonomous structure, there is some work on using *-autonomous categories to extend results from CQM to infinite dimensions (see Srinivasan's thesis on "Dagger Linear Logic and Categorical Quantum Mechanics), where they additionally generalize the definition of Frobenius algebra to a *-autonomous setting. Similar things have been done before, where and are the types of the Frobenius monoid and comonoid, which can then interact via the linear distributive law. This might serve as a direction in which to answer to your second point.
Regarding duality, in the field of program semantics, there is an important model of higher-order probabilistic programming called probabilistic coherence spaces. These form a *-autonomous category, and thus have a notion of duality. If we think of as being some set of states , then can be thought of as the set of states which "(sub-)normalize" every state of . Objects are then subsets such that . Consider atomic types here as being the type of sub-normalized states (of possibly countable dimension), closed under double dual. First-order processes are the type of processes (matrices) taking (sub-)normalized states of to (sub-)normalized states of . And for higher-order processes, an example: is the type of processes (matrices) taking first-order processes of type to (sub-)normalized states of . Note this is _not_ degenerate: it is not a compact closed category.
Just to reiterate, I don't consider myself an expert here, I'm just trying to give some context and motivation for my naive questions. Obviously, I don't claim any of the above remarks solve the issues with infinite dimensions, nor do I intend to directly contradict anything you said! I hope none of this is derailing the topic too much.
Jules Hedges said:
Going into good solid scientific wild speculation, my first thought was that if is "the type of random processes/random variables of type " then ought to be "the type of events/measurements of type " (although we can expect to run into the same normalisation issues, probably)
That was my first thought as well, but then we seem to be working in a different categorical framework, in the sense that the morphisms are no longer interpreted as Markov kernels. And so then I wouldn't know how to think of a morphism in general. Any ideas?
@Chris Barrett, that makes good sense. I guess it's important to distinguish two separate issues concerning the possibility of having a Frobenius structure (leaving aside the desirability question):
I understand that there may be situations where one really wants to have an equality predicate, and therefore the multiplication. In such a context, it's certainly feasible to move to subnormalization even in measure-theoretic probability, and I think that one will then have non-unital Frobenius structures. (There's no duality since there are no cups, so no problems with infinite-dimensionality.) By what I said above, I think the price to pay is that one probably cannot expect conditioning to make sense, and infinite tensor products or having infinitely many random variables is not going to work either. But I can imagine that these costs are worth the benefit in some applications.
Concerning units on the other hand, maybe you don't actually want those? They seem quite unnatural from the measure theory perspective: there is nothing like a distinguished "uniform distribution" on most infinite measurable spaces, and even in the finite setting I would say that their relevance for a foundational framework of probability is questionable at best.
I've heard of probabilistic coherence spaces, but unfortunately linear logic has always been obscure to me... Very vaguely, is it fair to say that the category of probabilistic coherence spaces is something like (closed) convex cones in (topological) vector spaces, and morphisms certain positive linear maps? If so, then that sounds very much like the category of what quantum physicists would call generalized probabilistic theories. This sort of thing seems to be of interest in the foundations of quantum mechanics, but I think that it's not so useful in the context of classical probability since the comultiplications or copy morphisms no longer exist.
Concerning units on the other hand, maybe you don't actually want those? They seem quite unnatural from the measure theory perspective: there is nothing like a distinguished "uniform distribution" on most infinite measurable spaces, and even in the finite setting I would say that their relevance for a foundational framework of probability is questionable at best.
This claim is wild. Maybe you can't expect to have such a thing on a maximal category of measurable spaces, but spaces which do admit uniform distributions are far more prevalent and useful than wild measure spaces that don't, at least in my limited experience of probability theory, so you should definitely not be dismissing categories of such things.
I don't think that I've been saying anything particularly surprising or controversial, so let me try to clarify. First of all, I'm not particularly interested in "wild" measurable spaces: it's questionable whether measure theory in its usual form is appropriate for "large" spaces, and there are ongoing attempts at finding a better theory. It's perfectly fine if we restrict ourselves to the standard Borel spaces, which are those that arise from separable and complete metric spaces. There everything works nicely.
Now let's get into what we mean by a "uniform distribution". In my earlier post, the notion that I had in mind would be something like "a distinguished morphism for every object , such that these are preserved by isomorphisms and multiplicative under tensor". But already isomorphism invariance is too strong: the usual "uniform distribution" on isn't invariant under the automorphism . With a bit more work, one can prove that an automorphism-invariant probability measure on doesn't exist.
I don't deny that people talk about a "uniform distribution" on and that this makes good sense. However, this refers to an incarnation of in a category where the objects have additional structure. So I think that the term makes sense whenever one has extra structure which induces a distinguished (finite) reference measure, such as a Riemannian metric with finite volume. But such additional structure is not needed in the foundations of probability: the amazing thing is that the mere structure of (something like) a measurable space is sufficient for many of its constructions and theorems. So basic probability enjoys much stronger invariance properties than these "uniform distributions" do, and presumably we want to preserve that.
Do you still disagree? If so, what do you mean by "uniform distribution"?
Morgan Rogers (he/him) said:
spaces which do admit uniform distributions are far more prevalent and useful than wild measure spaces that don't
I must also point out that with its standard measure doesn't admit a uniform distribution
Or for that matter
Tobias Fritz said:
I don't deny that people talk about a "uniform distribution" on and that this makes good sense. However, this refers to an incarnation of in a category where the objects have additional structure. So I think that the term makes sense whenever one has extra structure which induces a distinguished (finite) reference measure, such as a Riemannian metric with finite volume. But such additional structure is not needed in the foundations of probability: the amazing thing is that the mere structure of (something like) a measurable space is sufficient for many of its constructions and theorems. So basic probability enjoys much stronger invariance properties than these "uniform distributions" do, and presumably we want to preserve that.
Do you still disagree? If so, what do you mean by "uniform distribution"?
I'm on board with seeing what you can do without extra assumptions, but it seemed to me you were asserting that the extra structure wasn't worth considering, when there is clearly plenty of probability theory (or applications thereof) that do involve uniform distributions. I didn't really understand your counterexample for ; in what category are you working for the morphism to be a distribution while to be an automorphism on (did you mean for the mapping to be equipped with some extra info?).
The specific example I shall mention (although I've failed for the past half hour to find a nice probability monad whose Kleisli category it can live in) is to consider compact, locally compact topological groups. These have a distinguished uniform probability measure, the Haar measure. I remember my representation theory lecturer back in undergrad explaining that this is what is used to extend results from representation theory for finite groups to compact groups, which is at least modestly important. Of course, I could be mistaken in supposing that these will fit neatly into the Markov category formalism, because there are enough gadgets flying around in this topic that there are likely compatibility conditions that I'm ignorant of, but maybe the suggestion could be motivating in its own right?
Morgan Rogers (he/him) said:
I'm on board with seeing what you can do without extra assumptions, but it seemed to me you were asserting that the extra structure wasn't worth considering, when there is clearly plenty of probability theory (or applications thereof) that do involve uniform distributions.
Right. My statements are all to be understood as living in a specific (and quite foundational) context in which those extra structures are not relevant. And that covers quite a lot of ground already!
I didn't really understand your counterexample for ; in what category are you working for the morphism to be a distribution while to be an automorphism on (did you mean for the mapping to be equipped with some extra info?).
Just the Kleisli category of the Giry monad on (say standard Borel) measurable spaces. So morphisms are probability measures on , and through the Kleisli adjunction every measurable map is still a morphism in here. Considering as a measurable space with its Borel -algebra as usual, the measurable map has measurable inverse and is therefore an automorphism.
The specific example I shall mention (although I've failed for the past half hour to find a nice probability monad whose Kleisli category it can live in) is to consider compact, locally compact topological groups. These have a distinguished uniform probability measure, the Haar measure. I remember my representation theory lecturer back in undergrad explaining that this is what is used to extend results from representation theory for finite groups to compact groups, which is at least modestly important. Of course, I could be mistaken in supposing that these will fit neatly into the Markov category formalism, because there are enough gadgets flying around in this topic that there are likely compatibility conditions that I'm ignorant of, but maybe the suggestion could be motivating in its own right?
Yes, absolutely! It goes a bit beyond the foundational setting that I've been referring to, but it's clearly very important.
As for a categorical framework for this, here's one attempt. The category of compact Hausdorff spaces carries a probability monad called the [[Radon monad]] . It follows that is a Markov category. Its morphisms are continuous Markov kernels, and its subcategory of deterministic morphisms is exactly the usual . Your group can now be interpreted as a group object in this category (since the local compactness is trivially true by compactness). Hence we can talk about the category of internal -spaces in , which is again a Markov category, call it . I believe that the existence and uniqueness of the Haar measure amount to saying that when taking to act on itself (from either side) and considering it as an object in , then there is a unique morphism . Right?
If I remember correctly, we got into the discussion about uniform distributions by talking about the relevant categorical structure for probability, where I had claimed that one doesn't want objects to have distinguished units. I think that the categorical formulation of the previous paragraph illustrates my point: the object does have a distinguished unit, but I wouldn't know what the distinguished unit of a general object should be.
Tobias Fritz said:
I've heard of probabilistic coherence spaces, but unfortunately linear logic has always been obscure to me... Very vaguely, is it fair to say that the category of probabilistic coherence spaces is something like (closed) convex cones in (topological) vector spaces, and morphisms certain positive linear maps? If so, then that sounds very much like the category of what quantum physicists would call generalized probabilistic theories. This sort of thing seems to be of interest in the foundations of quantum mechanics, but I think that it's not so useful in the context of classical probability since the comultiplications or copy morphisms no longer exist.
Indeed it would seem that you're roughly correct about PCS: https://arxiv.org/pdf/1803.06005.pdf
Oh, nice! This looks like it could actually be helpful for someone like me to understand linear logic :sweat_smile:
Tobias Fritz said:
I didn't really understand your counterexample for ; in what category are you working for the morphism to be a distribution while to be an automorphism on (did you mean for the mapping to be equipped with some extra info?).
Just the Kleisli category of the Giry monad on (say standard Borel) measurable spaces. So morphisms are probability measures on , and through the Kleisli adjunction every measurable map is still a morphism in here. Considering as a measurable space with its Borel -algebra as usual, the measurable map has measurable inverse and is therefore an automorphism.
Ah so the endomorphism is "really" . I did the computation of the composition in the Kleisli category, where we indeed get , as you alluded to.
The category of compact Hausdorff spaces carries a probability monad called the [[Radon monad]] . It follows that is a Markov category. Its morphisms are continuous Markov kernels, and its subcategory of deterministic morphisms is exactly the usual . Your group can now be interpreted as a group object in this category (since the local compactness is trivially true by compactness). Hence we can talk about the category of internal -spaces in , which is again a Markov category, call it . I believe that the existence and uniqueness of the Haar measure amount to saying that when taking to act on itself (from either side) and considering it as an object in , then there is a unique morphism . Right?
Almost; in general the left and right Haar measures are different; there's an example further down the Wikipedia entry, but considering the left and right actions independently is fine. Thanks for putting this together, I'm very pleased to discover the potential for such nice interactions between the construction of the category of internal groups and the Kleisli category (I haven't much experience thinking about Markov categories, but being able to build them in many different ways is exciting from the point of view of eventually being able to apply your results in an exotic setting without too much effort).
If I remember correctly, we got into the discussion about uniform distributions by talking about the relevant categorical structure for probability, where I had claimed that one doesn't want objects to have distinguished units. I think that the categorical formulation of the previous paragraph illustrates my point: the object does have a distinguished unit, but I wouldn't know what the distinguished unit of a general object should be.
I see what you mean. I think this became a debate because I was reading the OP question as asking "what would happen if a Markov category had structure XYZ?" (a pretty reasonable question) rather than "should/do Markov categories come equipped with structure XYZ (for the purposes of synthetic probability theory)?" which you have made a solid argument against. Thanks for taking the time to address my concerns!
Oh, now I can go back and make sense of a lot of the discussion in context. If you have a traced Markov category then every object admits a canonoical state , which can be reasonably expected to behave like a uniform / max entropy distribution at least in "reasonable" examples. That helps for understanding which Markov categories can be expected to admit a trace, and which definitely shouldn't
You can pull the same trick in any traced cartesian category. In a category of computable functions (or a model of computable functions) where trace is non-well-founded recursion, the point is the nonterminating value of , which is the state about which you have the least knowledge
Aka the thing corresponding to the Haskell code let x = x in x
, which loops forever if you evaluate it
Hi @Jules Hedges, do you have a traced Markov category that does this? Would be very interesting. I am looking at the traced premonoidal paper again. Proposition 3.2.1 seems to specialize to Markov categories to give a correspondence between trace operators on Markov categories and a special kind of fixed-point operator (Def 3.3). This is a generalization of Hasegawa/Hyland's characterization for cartesian cats. Now for the kind of trace you are talking about, with trace=uniform distribution, from that perspective, I'm worried about the "diagonal property" (Def 3.3(3))...
I don't, but nor have I tried, it's just an initial idea for a construction. Failure is definitely an option
It’s interesting. I conjecture that there’s also a non-trivial traced Markov category that just does the mundane Haskell-like thing. But that's not inconsistent with your idea -- there could be very different traces around.
Also, speculating on the topic of uniform distributions. I sort-of hope that “synthetic probability” might help us understand improper priors, in the same way that SDG helps us understand infinitessimals. For example, Dario's Markov category GaussEx (here) does have a uniform distribution on all of . (Or rather, it has a morphism that could be thought of like that.)
Interesting thread! I've just started looking into traces because they seem relevant to comb factorizations as used here. In order for comb application to be well-defined, we want an implicational axiom like the following to hold
combs.jpeg
This axiom can be proven in every Markov category that embeds faithfully in a compact closed category (as in the paper), or even in a traced category.
Now, the axiom is in fact invalid when we have atomless distributions around. Take to be the uniform distribution on , and let and (Iverson brackets). Then the left hand sides are equal, but the RHS are not (take ). This shows that a whole host of Markov categories cannot be embedded faithfully in any traced category, for example BorelStoch, Stoch, or nominal sets with name generation.
Tagging @Nathaniel Virgo because I remember we talked about comb factorizations.
Dario Stein said:
Interesting thread! I've just started looking into traces because they seem relevant to comb factorizations as used here. In order for comb application to be well-defined, we want an implicational axiom like the following to hold [...]
For what it's worth I think you can talk about comb factorisations in a more general setting by defining 2-combs as optics.
The comb-factorisation from that paper then factorises a state into an optic and a morphism (equivalently an optic ).
Since you can have inequal optics which induce equal morphisms this sidesteps the need for the extra coherence axiom for application.
Interesting, is the optics approach sufficient to replicate the arguments in the paper? In the end, the goal is to prove certain equalities between morphisms, no?
I haven't worked through the arguments in the final section in detail, but I think it all works the same. In either case "2-combs" are just an intermediate structure used in the definition of an intervention, that you end up mapping back into the underlying Markov category
There's maybe a neater way to see the relationship between the notion of combs as optics and the notion of combs using a trace in the paper, using the fact that optics somehow represent feedback operators on a category. That is, a feedback structure on a category (ie a trace without the yanking axiom) is equivalently a factorisation of the hom functor through the inclusion . But I don't see how to make a precise connection right now
Hi @Dario Stein, the theorem gives a nice new angle, thanks.
The trace I think we have implemented in the LazyPPL library is given by freezing the random seed and then using Haskell's recursion. I am using the fact that every type is pointed (lifted).
We also have equality test and uniform distributions in the library, so I wonder whether you have now shown that it is actually not a trace, or if there is a subtle point about divergence.
(The reason I haven't actually proved whether it is a trace is that I am still not sure what the categorical semantics of Haskell should be, once you consider non-termination.)
Sorry for being late to this thread, @Dario Stein. Thank you for this! Indeed it completely answers the question I was asking people at ACT - in the negative, which is really surprising to me.
What I was trying to do was to start from the view that combs are optics, but then show that this reduces to something simpler in the case where the underlying category has conditionals, in a similar way to how optics reduce to lenses in a cartesian monoidal category. To make that work I thought I seemed to need an axiom similar to the one you posted. There might be some other way to do it now I know that that doesn't work, though, I'm not sure.