Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: categorical probability

Topic: UP for conditionals?

Matteo Capucci (he/him) (Oct 18 2023 at 06:21):

Is there a known perspective on what conditionals are, categorically speaking? To the best of my knowledge, in categorical probability one treats having conditionals as a property or a Markov category but it seems to be disconnected from known kinds of structure, having a conditions is just having a (natural?) map ${\cal C}(A, X \otimes Y) \to {\cal C}(A \otimes X, Y)$ .
In particular, I'd be stoked to learn conditionals have/arise from some universal property.

Nathaniel Virgo (Oct 18 2023 at 07:21):

[I guess the following isn't that relevant in retrospect, as the question is about conditionals in general, but maybe it's interesting anyway]

I'm also really interested in this. Here's one thing that might be relevant: consider a strongly representable Markov category (see version 2 of the preprint of the representable Markov categories paper, but not later versions). $\mathbf{BorelStoch}$ is such a category and so is the Kleisli category of the distribution monad. Strongly representable means that for any map $f:A\to X\otimes Y$ there is a unique map $f^\square : A\to X\times P(Y)$ such that $f$ has the property of being "deterministic given $X$ " (see the paper for the definition) and such that $f^\square;(\mathrm{id}_X\otimes \mathrm{samp}_Y) = f$ , where $\mathrm{samp}_Y$ is the sampling map.

In particular we can apply this to $\mathrm{samp}_{X\otimes Y}$ , which gives us a unique map $h_{X,Y}:P(X\otimes Y)\to X\otimes P(Y)$ with the properties that $h_{X,Y}$ is deterministic given $X$ and $h_{X,Y};(\mathrm{id}_X\otimes \mathrm{samp}_Y) = \mathrm{samp}_{X\otimes Y}$ . I don't know if that map can be defined in terms of a universal property exactly, but it feels like it's close. The map takes a joint distribution over $X$ and $Y$ and returns a sample from $X$ together with the conditional distribution of $Y$ given that sample. The map $f^\square$ is then given by $f;h_{X,Y}$ .

(The map $h_{X,Y}$ is related to what Jacobs calls hyper normalisation. Hyper normalisation is more or less the deterministic version of $h_{X,Y}$ , so it has type $P(X\otimes Y) \to P(X\otimes P(Y))$ . I have to thank an anonymous reviewer of my ACT paper for pointing out that strongly representable Markov categories and hyper normalisation should be related.)

Paolo Perrone (Oct 18 2023 at 07:52):

At least Bayesian inverses can be described categorically quite neatly as daggers in a dagger category, see definition 13.8 and proposition 3.9 in Tobias' paper.

Tobias Fritz (Oct 18 2023 at 07:58):

And since general conditionals can be seen as Bayesian inverses in a parametric Markov category, there is a sense in which the formation of general conditionals is a dagger.

Tobias Fritz (Oct 18 2023 at 08:02):

But this requires working with morphisms up to almost sure equality, which is sometimes desirable and sometimes not. If you don't want to do this, then it's morally wrong to consider the formation of conditionals as an operation $\mathcal{C}(A, X \otimes Y) \to \mathcal{C}(A \otimes X, Y)$ , since conditionals are not unique, and in the usual Markov categories of interest there is no canonical choice. This happens in discrete probability because of division by zero in $P(y|x) = \frac{P(x,y)}{P(x)}$ . It's much more severe with continuous variables, where you'll run into things like the Borel-Kolmogorov paradox if you pretend that conditionals are unique.

Tobias Fritz (Oct 18 2023 at 08:09):

That being said, there is some naturality inherent to condtionals. In order to state it, let me copy and paste from a draft, starting with a string-diagrammatic notation for conditionals introduced by Bart Jacobs:
image.png

Tobias Fritz (Oct 18 2023 at 08:10):

The idea is to put the dashed box around in order to clarify that the bent wire here is different from a cap in a compact closed category.

Tobias Fritz (Oct 18 2023 at 08:11):

We then have that conditioning twice is a.s. equal to conditioning in one go:
image.png

Tobias Fritz (Oct 18 2023 at 08:12):

Conditioning is natural up to a.s. equality with respect to post-composition:
image.png

Tobias Fritz (Oct 18 2023 at 08:13):

and similarly with respect to pre-composition by deterministic morphisms:
image.png

Tobias Fritz (Oct 18 2023 at 08:14):

But it's not natural with respect to pre-composition by general morphisms!

Tobias Fritz (Oct 18 2023 at 08:22):

This non-naturality manifests itself as Simpson's paradox! In the Wikipedia gender bias example, the failure of naturality is with respect to the distribution over departments: if you condition with that distribution inside the dashed box, you get the large bias in favor of men; whereas if you leave that distribution outside of the dashed box, which amounts to conditioning for each department separately, then you get the small bias in favor of women. (I wish there was a less politically charged example on that page, but ok.)

Tobias Fritz (Oct 18 2023 at 08:28):

So to summarize, the categorical properties of conditioning are a very interesting question, and it's something that has been nagging us for a long time. The hypernormalisation mentioned by @Nathaniel Virgo and the conditioning-as-dagger mentioned by @Paolo Perrone are two ways to get some nice categorical properties, but I'm not sure if either of them actually characterizes conditionals. (Also hypernormalisation has only been worked out in the discrete case so far.) So to really characterize conditionals, there may be no way around considering their existence as a mere existence without any canonical choices and without full naturality. In particular, I don't see what a universal property could look like.

Tobias Fritz (Oct 18 2023 at 08:39):

I hope that I'm wrong about hypernormalisation not characterizing conditionals -- @Nathaniel Virgo's construction of $h_{X,Y}$ looks very intriguing!

Paolo Perrone (Oct 18 2023 at 08:56):

By the way, many structures in probability fail to satisfy a universal property, but still have a categorical conceptual description ("moral" in the sense of Cheng).
Famous examples are the product probability, that's not a categorical product, and the conditional product, that's not a pullback.

Is there any insight on why conditionals should satisfy a universal property, maybe up to a.s. equality?
I can see how conditional expectation is morally universal, but not quite conditionals.
Does anybody have ideas on what they should represent?
(Maybe the OP has something specific in mind?)

fosco (Oct 18 2023 at 09:21):

Paolo Perrone said:

By the way, many structures in probability fail to satisfy a universal property, but still have a categorical conceptual description ("moral" in the sense of Cheng).

yet, a general meta-principle most category theorists apply is that no structure prone to be described categorically can really "fail" to satisfy a universal property; it can have a lax one, a weak one, a higher one... but if it doesn't have a universal property, it just doesn't exist (=it's not "well-formed")

fosco (Oct 18 2023 at 09:24):

I am saying this because the obstruction @Paolo Perrone is talking about is what makes categorical probability theory fascinating to my eye: is it a counterexample to this long established meta-principle?!

Paolo Perrone (Oct 18 2023 at 09:36):

fosco said:

I am saying this because the obstruction Paolo Perrone is talking about is what makes categorical probability theory fascinating to my eye: is it a counterexample to this long established meta-principle?!

To clarify: I'm not saying that, for example, the tensor product of probabilities does not satisfy any nontrivial universal property. For example, since the Giry monad is commutative, the tensor product of algebras classifies binary maps that are affine (under mixtures) separately in both arguments, kinda like the tensor product of vector spaces. The resulting universal map, for free algebras, is exactly the map forming the product probability $PX\times PY\to PX\otimes_P PY\cong P(X\times Y)$ .
One can always find interesting universal properties if one looks the right way -- categorical probability is not an exception to this, at least not in my opinion.
However, these universal properties tend to be not immediately obvious, and in the everyday practice of probability theory they remain somewhat "hidden". One could ask whether the reason for this is intrinsically mathematical or rather historical/sociological because of the way probability theory developed -- I don't have an answer to that. What is true, though, is that when we formalize a probabilistic concept categorically, things such as universality, functoriality or naturality tend to come up only later in the story.

Tobias Fritz (Oct 18 2023 at 09:39):

However, there certainly are other constructions that can be described categorically but don't enjoy any clear universal property. For example, the tensor product of Hilbert spaces comes to mind: although (in finite dimensions) its underlying vector space has the usual universal property of a tensor product of vector spaces, it's not at all clear what a universal property for the inner product on a tensor product Hilbert space might be. I bet that one can find many other natural examples of monoidal categories in which the monoidal structure doesn't have any clear and nontrivial universal property.

Paolo Perrone (Oct 18 2023 at 09:41):

Still I wonder what the OP has in mind with universality of conditionals.
Often, whenever someone has a "feeling" about a certain probabilistic concept, a new piece of categorical probability is about to be discovered. (Categorical probability is still that new and unexplored!) Sometimes it turns out to be exactly what one had in mind, sometimes it's fully surprising.

Matteo Capucci (he/him) (Oct 19 2023 at 09:27):

Thanks a lot for the thoughtful replies!

Matteo Capucci (he/him) (Oct 19 2023 at 10:10):

I have a feeling double categorical UPs might help here... In particular Markov categories suffer from the vice of being 'non-categorical' structure: the comonoid structure (to be fair, the comagma structure!) isn't preserved by morphisms, so, it seems only natural that they don't play nicely with mapping properties. On the other hand, the fact so many things are characterized only up to a.s. equality makes me suspect a.s. equality plays the role of higher equivalence and that such characterizations 'up to equivalence' are really UPs in disguise (dispelling the @fosco's fears).

This two facts, together with the special status of deterministic maps, had me thinking for a long time that Markov categories should really be organized as cartesian double categories where loose arrows ('channels'/'kernels') are any maps, tight ('deterministic') maps are restricted to deterministic ones, and such that a square is filled with a 2-cell iff the square commutes almost surely.

Conversely, a cartesian double category is Markov iff (1) it has all companions (this yields a gs-category) and (2) every square involving a delete morphism on the bottom commutes (or some other equivalent condition to make delete natural in the loose direction too).

Notice, in this way, we already recovered one universal property, namely that of $\otimes$ , which gets lost when mixing deterministic maps and channels.

Aside: representable Markov categories also show a natural double categorical inclination, through the notion of strictly representable vertical arrows (see here, slide 22). This would put the theory on a footing which highlights even more the similarity between presheaves and distributions, which Paolo worked on.

Now @Tobias Fritz's interesting series of screenshots shows that, in Markov categories with conditionals, channels behave a bit like undirected morphisms, also a typical phenomenon in double categories (think: relations, profunctors, etc.). Then perhaps a universal structure to look for would be that of [[compact double category]] (or half of that, thus 'teleological' double categories). So if I had time now, I would go fishing there for a nice universal property.

Tobias Fritz (Oct 19 2023 at 10:27):

Those are interesting thoughts for sure! But what do you mean by "a square is filled with a 2-cell iff the square commutes almost surely"? Almost surely with respect to what?

Matteo Capucci (he/him) (Oct 19 2023 at 10:35):

Ah, good point... Maybe strictly commuting is enough then?

Matteo Capucci (he/him) (Oct 19 2023 at 10:38):

Yeah that's it... I get periodically confused about this. Equality of morphisms is enough to talk about a.s. equality since it means equality after precomposition and copy.
Though if you have channels $f,g:X \to Y$ you can still talk about $1_X$ -almost surely equality no? Or is $\Delta;(f \times Y) = \Delta;(g \times Y)$ equivalent to $f=g$ ?

Paolo Perrone (Oct 19 2023 at 10:39):

In recent work, Noé Ensarguet and I showed that in a suitable dagger category, the a.s. deterministic morphisms form the subcategory of dagger epimorphisms. (I believe those satisfy a universal property, as per Martti Karvonen's thesis.)

Tobias Fritz (Oct 19 2023 at 12:11):

And concerning "channels behave a bit like undirected morphisms", this is not in the case in the usual Markov categories of interest: there's no canonical morphism $Y \to X$ associated to a morphism $X \to Y$ . Because if there was, then in particular the deletion morphism $X \to I$ would have a canonical counterpart $I \to X$ for every object $X$ . That, is every measurable space would come equipped with a canonical probability measure. I think that this is "morally wrong" even for finite $X$ already

Tobias Fritz (Oct 19 2023 at 12:15):

So to get the dagger, one needs to apply the "ProbStoch" construction to get to the category of probability spaces and a.s. equivalence classes of channels. The problem with this is that the result isn't a Markov category anymore -- it's just a symmetric monoidal dagger category.

Tobias Fritz (Oct 19 2023 at 12:26):

So for now I'm still skeptical about a double categorical approach. BTW the universal property for $\otimes$ isn't "lost" when mixing deterministic maps and channels! For one thing, the tensor is the categorical product on the subcategory of deterministic morphisms. But more interestingly, also the tensor product of nondeterministic morphisms is encoded indirectly via universal properties, at least in a representable Markov category: there, the associated monad $P$ is automatically a commutative monad, and in particular we have the formation of products map $PX \times PY \to P(X \otimes Y)$ which Paolo mentioned earlier encoded in its structure coming from the Kleisli adjunction.

Nathaniel Virgo (Oct 19 2023 at 13:00):

Matteo Capucci (he/him) said:

Or is $\Delta;(f \times Y) = \Delta;(g \times Y)$ equivalent to $f=g$ ?

It is: just post-compose both sides with $\mathrm{del}_Y\times Y$ .

Matteo Capucci (he/him) (Oct 21 2023 at 15:14):

Tobias Fritz said:

And concerning "channels behave a bit like undirected morphisms", this is not in the case in the usual Markov categories of interest: there's no canonical morphism $Y \to X$ associated to a morphism $X \to Y$. Because if there was, then in particular the deletion morphism $X \to I$ would have a canonical counterpart $I \to X$ for every object $X$. That, is every measurable space would come equipped with a canonical probability measure. I think that this is "morally wrong" even for finite $X$ already

Totally agree, that's why I said 'a bit'. It was an observation following Jacobs convention of turning around wires to denote conditionals.

Matteo Capucci (he/him) (Oct 21 2023 at 15:20):

Tobias Fritz said:

So for now I'm still skeptical about a double categorical approach. BTW the universal property for $\otimes$ isn't "lost" when mixing deterministic maps and channels! For one thing, the tensor is the categorical product on the subcategory of deterministic morphisms. But more interestingly, also the tensor product of nondeterministic morphisms is encoded indirectly via universal properties, at least in a representable Markov category: there, the associated monad $P$ is automatically a commutative monad, and in particular we have the formation of products map $PX \times PY \to P(X \otimes Y)$ which Paolo mentioned earlier encoded in its structure coming from the Kleisli adjunction.

Fair enough, after all, I don't have much to show in its favor except my inclination for them.

Tobias Fritz (Oct 21 2023 at 15:34):

BTW I'm also a big fan of double categories, and of equipments in particular! So I'd love to see them getting put to good use in a probability context as well.

Paolo Perrone (Jan 05 2024 at 10:42):

Paolo Perrone said:

However, these universal properties tend to be not immediately obvious...

If someone wants to see some universal properties that come up in probability theory, my CT talk recording is out, "Universal Properties in Probability Theory":
https://www.youtube.com/watch?v=gmSlbgmLyVQ&list=PLu4STGsfbix9l6rPxGsjG6Gl822k1hj2X&index=28