Markov categories and enrichment · learning: questions

For any "reasonable" category of measurable spaces and Markov kernels, given morphisms

A\xrightarrow{f} B

and

A\xrightarrow{g} B

and

\lambda \in [0,1]

, we can form a new morphism

A\xrightarrow{(1-\lambda)f + \lambda g} B

, which informally corresponds to flipping a biased coin and doing

f

with probability

1-\lambda

and

g

with probability

\lambda

Question 1: is there a sense in which this is true of all Markov categories, or is it a special property that only some of them have?

Question 2: this kind of operation on morphisms seems to suggest there's some kind of enrichment going on, but I don't know very much about enriched category theory. Can someone help me figure out exactly what's going on? I'm still somewhat new to category theory so the more explicitly it can be spelled out the more it will help me.

Can we say that (some) Markov categories are enriched in convex spaces? Or does it have something to do with self-enrichment?

Evan Patterson (Jun 15 2020 at 04:30):

Nathaniel Virgo (Jun 15 2020 at 04:42):

I thought about Rel, but I figured you might be able to define

(1-\lambda)f + \lambda g

as something like the set union of the relations

f

and

g

\lambda\in (0,1)

, or

f

\lambda = 0

g

\lambda=1

. I think that meets the axioms for a convex space. (But I guess one could easily add an additional requirement to disallow that sort of thing, if desired.)

Evan Patterson (Jun 15 2020 at 05:04):

Oh, interesting. Maybe that works. Two further thoughts then. First,

\mathsf{Set}

, like any cartesian category, is also a Markov category of a very boring kind, but you can't play that trick in

\mathsf{Set}

. Also, there is a theorem, reported in arxiv:1105.1270 by Capraro and Fritz and going back to Marshall Stone, that any convex space embeds into a real vector space provided it satisfies a certain cancellation property. I believe that condition would rule out the examples coming from

\mathsf{Rel}

Nathaniel Virgo (Jun 16 2020 at 04:14):

I guess it means we're thinking of convex spaces as a monoidal category with a monoidal product that behaves like the tensor product of vector spaces. Do you know if that's been worked out anywhere? (@Tobias Fritz, do you know?)

Arthur Parzygnat (Jun 16 2020 at 08:16):

@Nathaniel Virgo Yes, the general situation is as follows. For enriched categories, since

\mathrm{Hom}(A,B)

is an object of some category

\mathcal{E}

(as opposed to

\mathbf{Set}

), if you want to compose morphisms, you have to be able to construct a composition map, which is a morphism

\mathrm{Hom}(A,B)\otimes\mathrm{Hom}(B,C)\xrightarrow{\circ}\mathrm{Hom}(A,C)

in your enriching category

\mathcal{E}

, where

\otimes

is a monoidal product in

\mathcal{E}

. Since you're working with convex spaces, you have to choose your morphisms carefully. Convex/concave maps don't compose, but affine ones do. @Sam Tenka discusses the monoidal structure on this category in his MIT lecture https://youtu.be/Td_7_7MBHcU (check around 24 minutes into the video). The fact that @Evan Patterson said "bi-linearity" comes from the fact that if you fix either coordinate (which corresponds to a composite such as

\mathrm{Hom}(A,B)\otimes1\xrightarrow{\mathrm{id}\otimes g}\mathrm{Hom}(A,B)\otimes\mathrm{Hom}(B,C)\xrightarrow{\circ}\mathrm{Hom}(A,C)

, where

g

is a fixed "element" of

\mathrm{Hom}(B,C)

), you still have a morphism in

\mathcal{E}

. (BTW, check 35 minutes into the video for examples)

Nathaniel Virgo (Jun 16 2020 at 08:46):

Right, thank you, this is all very helpful. I remember that from Sam's talk. It's a great talk and I've watched it online several times, but being still somewhat new to category theory I haven't been able to understand all of the details. (Maybe Sam can help with my question below.)

To give a precise question: for abstract convex spaces

A, B

(as defined at https://ncatlab.org/nlab/show/convex+space for example), what exactly is

A\otimes B

? Sam says it's defined by the hom-tensor adjunction, which makes sense, and in his examples it's different from the categorical product, which seems reasonable. But can we construct it explicitly, in terms of the sets

A

and

B

and their associated convex combination operations?

Evan Patterson (Jun 16 2020 at 18:18):

I'm not sure about abstract convex spaces---I should watch Sam's talk---but for closed convex sets

A

and

B

in Euclidean space the tensor product is called the projective tensor product. It is the closed convex hull of the tensor products

x \otimes y

of vectors

x \in A

and

y \in B

. (For closed convex sets containing the origin, there is another tensor product, the injective tensor product, given by polarity.) See Sec 4.1.4 of the book Alice and Bob Meet Banach, available for download here.

Evan Patterson (Jun 16 2020 at 18:26):

I really feel like this stuff should be better known and easier to find. For whatever reason, the classic books on convex analysis don't talk about tensor products.

Sam Tenka (Jun 17 2020 at 00:17):

Hi! I am also a beginner in category theory, but here are my two cents regarding @Nathaniel Virgo 's question. First, I note that what I say will be redundant with @Evan Patterson 's remark. I will be super-explicit, and I don't recommend actually working with tensor products this concretely.

Here is a concrete construction of the tensor product between two convex spaces A and B. Let's first say what

A\otimes B

is as a set. We will also write A and B to denote the underlying sets. Then, as a set,

A\otimes B

is a quotient of a subset of the set

\mathcal{F} = \mathbb{R}^{A\times B}

of functions from

A\times B

to the real line. If

p,q \in \mathcal{F}

and

x, y \in \mathbb{R}

, then the point-wise linear combination

((a,b) \mapsto x\cdot p((a,b)) + y\cdot q((a,b))

is also in

\mathcal{F}

. We will write that linear combination as

x\cdot p + y\cdot q

. Helped by associativity, we will more generally write

\sum_i c_i \cdot p_i \in \mathcal{F}

for

c_i \in \mathbb{R}

and

p_i \in \mathcal{F}

. We will also overload notation by writing

(a,b)

for the partially evaluated kronecker delta function

(a,b) \triangleq ((a^\prime,b^\prime) \mapsto 1\text{~if~}(a^\prime,b^\prime)=(a,b)\text{~else~}0)\in \mathcal{F}

. Later on, we will write (the equivalence class corresponding to) that delta function as

a \otimes b

Alright. Which subset do we take? Well, we take the subset

\mathcal{S}

of all those functions with finite support and non-negative coefficients and whose values sum to

1

. (So far, we've just found a set-theoretic way to talk about the finitary formal convex combinations over

A \times B

). Working in the category of convex spaces, we may equivalently have written the co-product

{\star}^{\coprod (A\times B)}

indexed by the elements of the set

(A\times B)

, where

{\star}

is terminal. But I want to be concrete, so I'll avoid this viewpoint.

The next step is to take a quotient of

\mathcal{S}

. We mod out by the equivalence relations that we would expect due to bi-linearity. Specifically, whenever

p \in \mathcal{F}

and

x\cdot(a,b) \in \mathcal{F}

such that

p + x\cdot(a, b) \in \mathcal{S}

, we have the relations:

p + (x a,b) \sim p + x (a,b) \sim p + (a, xb)

for any

p+x (a, b) \in S

. Also, whenever

p\in \mathcal{F}

and

(a + \alpha, b) \in \mathcal{F}

such that their sum is in

\mathcal{S}

, we have:

p + (a + \alpha,b) \sim p + (a,b) + (\alpha,b)

for any

p+(a + \alpha, b) \in S

. Liekwise, we have

p + (a, b+\beta) \sim p + (a,b) + (a,\beta)

. We take the equivalence relation generated by these relations, then mod out. And that's the tensor product as a set!

That's the underlying set, and the convex-mixture operation is defined as we would expect. I could write this out more explicitly if you'd like, but by default I am too lazy to do so :slight_smile:

It is probably useful to compare this construction to that of the tensor product of modules (e.g. as found in Atiyah's introduction to commutative algebra). It is the same idea.

Sam Tenka (Jun 17 2020 at 01:02):

Here's an example of tensor product for convex algebras not specialized to the "subset of euclidean space" case.
We take

A = \{no, maybe, yes\}

, which is the familiar real unit interval

I=[0,1]

with its interior collapsed into a point. For example, in this convex space, we have

(2/3) maybe + (1/3) yes = maybe

and

(4/5) no + (1/5) yes = maybe

and, of course,

(100/100) yes = yes

Then what is

A \otimes I

? Well, using bi-linearity, we can group like terms in any given element of this tensor product in order to write that element as

yes \otimes a + maybe \otimes b + no \otimes c

, where

a,b,c\in I

. When we mod out by further relations such as

yes \otimes (1/3) + no \otimes (2/3) = maybe \otimes (2/3) + no \otimes (1/3)

, we get the desired tensor product. (Here,

a \otimes b

is the formal linear combination with one term, a pair

(a,b)

. We also write it like

1 \cdot (a,b)

.).

Intuitively, we get a tetrahedron "sandwich" whose two outer two "breads" are intervals but whose "filling" is all glued together in mayonaise.

Evan Patterson (Jun 17 2020 at 02:01):

Hi Sam, thanks, that's very helpful. I think I get the basic idea of the construction for convex spaces, although I don't understand the notation

p + x(a,b) \in S

. Anyway, is this construction written down somewhere in the literature?

Sam Tenka (Jun 17 2020 at 02:54):

Hi Evan, no problem. By

p+x(a,b)

, I mean a formal convex combination of a bunch of pairs in

A \times B

such that one of the terms is

(a,b)

and has coefficient

x \in [0,1]

. The

p

stands for the remaining terms. I bet this construction is written down somewhere, but I don't know where. I bet @Tobias Fritz would know?

My involvement in categorical probability was to think about it for fun rather than the more scholarly and proper route of reading about it. When I first asked @David Spivak, he said he hadn't encountered work on categorical probability, so I took that as an excuse not to do a deeper literature search.

Nathaniel Virgo (Jul 01 2020 at 08:41):

Hi @Sam Tenka , sorry for being late replying. I also have a notation issue: what do you mean by

xa

and

a + \alpha

in your description of the relation you're modding out by? (

a

is an element of a convex space, I think, so we can form convex combinations with other elements, but not add them or multiply by scalars.)

Nathaniel Virgo (Jul 01 2020 at 08:55):

I have another more general question, too. I guess this is a question about convex spaces in general. Why do we only care about finitary convex combinations?

I ask because in some sense a probability measure is a kind of potentially-infinite generalisation of a convex combination, and so my instinct says that instead of considering finitary formal convex combinations of

A\times B

, we should instead be trying to turn

A\times B

into a measurable space somehow and consider probability measures over it.

I guess that would extend to the definition of a convex space: instead of defining it in terms of finitary convex combinations, we could define it as a measurable space

S

together with a map from probability measures over

S

S

itself (to be interpreted as an expectation, or a generalisation thereof), which has to obey a bunch of axioms that I haven't tried to figure out.

But I don't know if this instinct is a good or a bad one. The question is somewhat similar to what I was trying to ask in the probability as logic thread - we have uncountable unions in the definition of a topology, countable unions in the definition of a

\sigma

-algebra and finitary combinations in the definition of a convex space, and I don't have a good conceptual picture of how those different choices relate to one another.

Have you thought much about how the convex spaces picture of probability will look for, e.g. probability distributions over the reals?

Tobias Fritz (Jul 01 2020 at 10:02):

I see this thread only now that @Nathaniel Virgo pointed me to it. So let me start by commenting on the final question first.

Yes, that instinct is completely correct! Working with finitary convex combinations only is like doing probability theory with only finitely supported distributions. It doesn't take you very far.

As you suggest, what one really wants to have is a "space" in which every probability measure has a designated midpoint or barycenter, and that these specifications of barycenters are suitably compatible with each other. The relevant compatibility condition is exactly that of an Eilenberg-Moore algebra of a probability monad.

There are many variations on this theme, with different probability monads on different categories of spaces. (The above nLab page gives a nice overview, thanks mainly to @Paolo Perrone.) The most basic one is the finitary distribution monad on Set; in this case, the Eilenberg-Moore algebras are exactly the convex spaces in the sense of this thread. Another example is the Kantorovich monad, the algebras of which are exactly the closed convex subsets of Banach spaces! There, it should be intuitive that every probability measure with finite first moment has a sensible notion of barycenter.

Tobias Fritz (Jul 01 2020 at 11:08):

Concerning tensor products of convex spaces, @Sam Tenka, yes, that construction is well-known, although in slightly different form. The usual way to proceed is to construct the tensor product as a quotient of the free convex space over

A \times B

, or in other words a quotient of the space of formal convex combinations of the elements of

A \times B

. That's exactly what the "subset" part of Sam's construction does. One reference that I know of (thanks to a pointer by Bill Lawvere) is the 1987(?) thesis of Xiao-Qing Meng, where this appears on p.11-12. The convex structure on the hom-sets of the category of probabilistic mappings also plays in important role in the thesis.

Perhaps more enlightening is the general categorical framework which allows for that construction of tensor products. This is the one of tensor product of algebras over a commutative monad. For any commutative monad on any monoidal category, one can talk about "bihomomorphisms" between algebras

A \times B \to C

, and the tensor product

A \otimes B

is the universal object which makes these bihomomorphisms correspond to mere homomorphisms

A \otimes B \to C

. This generalizes the usual universal property of tensor products of vector spaces, and also of the tensor product of convex spaces as mentioned by Sam. The explicit construction as of these tensor products as coequalizers is exactly the general abstract version of Sam's/Meng's construction.

Tobias Fritz (Jul 01 2020 at 11:28):

Concerning @Sam Tenka's talk, it's been a while that I've watched it, but my recollection is that there are a number of things in there that need to be taken with a big grain of salt (besides much less originality than the talk suggests). One issue with it is that convex spaces do not form a Markov category. In fact, if

X

is a compact convex subset of a finite-dimensional vector space, then there is a map

X \to X \otimes X

whose two marginals are the identity if and only

X

is a simplex! This is the generalized no-broadcasting theorem. Thus the largest subcategory which has a Markov category structure is essentially the category of simplices and convex maps, or equivalently the category of stochastic matrices. In particular, Sam's treatment of Bayesian inversion for "problyhedra" doesn't work.

Tobias Fritz (Jul 01 2020 at 11:53):

Now finally on to the original question about enrichment! I think @Evan Pattersons answer goes very much in the right direction. From my perspective, this question is less about Markov categories as such, but rather about the closely related Kleisli categories of commutative monads, since it's typically the underlying monad which is "responsible" for the enrichment. Here's one abstract construction which explains this in some cases, building on the tensor product of algebras that I've mentioned above. The Kleisli category of the given monad is a full subcategory of the category of Eilenberg-Moore algebras, and it's usually this bigger category of algebras which carries a canonical self-enrichment, namely by virtue of being monoidal closed. This plays an important role in Anders Kock's approach to probability monads.

There are cases in which the category of algebras of a commutative monad carries a canonical enrichment which is not captured by the above. For example, there seems to be a sense in which the category of algebras of any "probability monad" is enriched over convex spaces, since the algebras of a probability monad are themselves convex spaces, and the morphisms are suitably nice convex maps. I don't know the details of how to explain this in general, but it has to do with morphisms of monads. I can try to expand in case that anyone wants to see this, or perhaps someone else already knows.

Final note: I know that this has been a mouthful, the nLab references may look scary, and this is the "learning" channel. So we can certainly discuss more of the details if desired, no matter how seemingly basic.

Nathaniel Virgo (Jul 01 2020 at 13:40):

Thanks, that's really helpful. I can now look at the definition of an algebra over a monad and understand what it means, in this case at least. (I'd seen it before but hadn't really understood it.)
image.png

The triangle says that if I form a distribution that's completely concentrated on one element, and then take the expectation, then I should get that same element back as the result. The square says that if I have a distribution over distributions over some space, I can either take the mean of each distribution and then take the mean of that distribution, or I can first collapse it into just a distribution over the space and take the mean of that distribution, and I should get the same answer. It's amazing how an example can make an abstract definition become clear.

So you're saying, essentially, that what I just described is the right way to define the kind of generalised convex space I mentioned above, and that the categories of these algebras are comparable to the category of convex spaces from Sam's talk. This is pretty neat and gives me something to sink my teeth into.

I still really like Sam's geometric approach to this topic. It makes all this feel much more concrete, and I wouldn't have understood why this was worth thinking about if it wasn't for his talk.

I'll think more about the enrichment thing now that I understand this, and ask questions if I need to. I'm still interested in whether there's anything fundamental that can be got at regarding how this relates to Markov categories, if it does.

One naïve question: can we say in general that the Kleisli category of a monad is enriched in the category of algebras over the monad? That seems to be what's happening here, and I'm wondering whether it's a general phenomenon or something special about probability monads.

Jules Hedges (Jul 01 2020 at 13:51):

This isn't totally precise, but if you're in a nice enough situation (probably a strong monad on a cartesian closed category, or something) then the hom-object looks like

x \to Ty

and it can be given the structure of a T-algebra

T(x \to Ty) \to (x \to Ty)

by .... well, I could write it using Haskell do-notation, it would be a pain to write without it. It's

a \mapsto x \mapsto do \{ f \leftarrow a; f (x) \}

Tobias Fritz (Jul 01 2020 at 14:10):

Ais I wrote above, I'm pretty sure that you need a commutative monad on a (merely) monoidal closed category. Anders Kock's paper is where the construction of hom-objects in the category of algebras has been spelled out, using the additional assumption that equalizers exist. I bet that his construction specalizes to yours when applied to free algebras, i.e. to the Kleisli category, and in particular that the relevant equalizer in this case is exactly your

x \to Ty

Tobias Fritz (Jul 01 2020 at 14:27):

@Nathaniel Virgo: Great! That kind of intuition for the algebras and the Kleisli morphisms is also very nicely in explained in Chapter 1 of Paolo's thesis.

In general no, the Kleisli category of a monad isn't enriched in the category of algebras. For example, consider the free group monad (on Set). Its category of algebras is exactly the category of groups, and its Kleisli category is equivalent to the category of free groups with group homomorphisms. To first approximation, the set of group homomorphisms between free groups doesn't have any additional structure beyond being a mere set, and in particular this Kleisli category is not enriched in the corresponding category of algebras. If you're more familiar with monoids or rings than with groups, then you can consider these instead, with the counterexample working the same way.

So at the very least, in order to get some enrichment you will probably want a commutative monad. In fact, since you can enrich only over a monoidal category in the first place, you will want to have a monoidal structure on the category of algebras; and that's exactly the tensor product that I've mentioned above, which exists for every commutative monad (with the existence of coequalizers as a mild additional assumption). The good news is that all probability monads are commutative! Thus one has analogues of Sam's tensor product, and can consider categories enriched in algebras. However, it's still not always clear whether the Kleisli category has such an enrichment, since the construction of hom-objects as given by Kock or @Jules Hedges only applies when the original category is monoidal closed.

If you're interested, we can try to work out a more general construction to explain why the Kleisli category of a probability is at least enriched in convex spaces, even if not necessarily enriched in the category of algebras.

Reid Barton (Jul 01 2020 at 14:29):

I was going to give the same example though. Group homomorphisms between general groups don't form a group, but group homomorphisms between free groups do, because they're really morphisms from a set into a (free) group, and this is a (non-free) group by elementwise product. I think this is related to Jules's strong monad setting.

Tobias Fritz (Jul 01 2020 at 14:33):

Oh right, of course :+1: But to get enrichment, you will also want composition to respect the enrichment. Is there a sense in which this is the case? What is the relevant monoidal structure on groups?

Reid Barton (Jul 01 2020 at 14:39):

Yeah, this seems like a bigger problem. I didn't read your second paragraph at first because of the long lines :slight_smile: but I agree you wouldn't get an enrichment unless the monad is commutative simply because there is no monoidal category to enrich in.

Reid Barton (Jul 01 2020 at 14:40):

I think you would only get a structure where

f \circ -

is a group homomorphism for each

f

Reid Barton (Jul 01 2020 at 14:40):

Tobias Fritz (Jul 01 2020 at 14:42):

Paolo Perrone (Jul 01 2020 at 15:47):

In order to enrich over something, you need that "something" to be closed, so that you can talk about internal homs.
If the monad is commutative, or equivalently monoidal, then the category of algebras has a canonical internal hom.
Now, the Kleisli category, as a subcategory of the category of algebras, is not closed, however, it's still enriched in algebras being a subcategory of a closed category. If I understand you correctly, that's what you need.

eric brunner (Jul 01 2020 at 16:03):

Nathaniel Virgo (Jul 02 2020 at 01:32):

I'm struggling a bit with the notion of a commutative monad. It sounds like it should be something simple, but the definition is super abstract and I'm not able to fit it all into my head at once. Could somebody perhaps give an example or a description of a "strength" for a probability monad? That would probably nudge me towards being able to grasp what these concepts are about.

(Alternatively, if there's a string diagram way to talk about commutative monads, that would probably help me too.)

Nathaniel Virgo (Jul 02 2020 at 02:03):

Just a quick note in case it's relevant, I chanced across a recent preprint that seems to be about the topics of this thread: Sturtz (June 2020) The existence and utility of Giry algebras in probability theory, https://arxiv.org/abs/2006.08290

Mike Stay (Jul 02 2020 at 04:52):

The category K must be monoidal, so that we can define composition as a morphism
    ∘: hom(y,z) ⊗ hom(x,y) → hom(x,z)

The use of "hom" here is just a function from pairs of objects of the enriched category to objects of

K.

Mike Stay (Jul 02 2020 at 05:16):

A monoid in the category Mon of monoids and monoid homomorphisms is a commutative monoid:

The Eckmann-Hilton argument shows that

\circ

and the multiplication

\otimes

in the monoid

M

have to coincide and that they're commutative:

a\circ b=(1\otimes a)\circ (b\otimes 1)=(1\circ b)\otimes (a\circ 1)=b\otimes a=(b\circ 1)\otimes (1\circ a)=(b\otimes 1)\circ (1\otimes a)=b\circ a

In the same way, a monoidal monad is a commutative monad. Being a monoidal monad

T\colon C \to C

means that

T

plays nicely with the tensor product in

C

: there are two natural transformations from

TX \otimes TY

T(X \otimes Y)

built from the strength, costrength, and multiplication, and the two transformations are equal to each other.

Strength for the probability monad takes an outcome

x

and a probability distribution

\sum_i c_i y_i

and produces the distribution

\sum_i c_i (x, y_i).

Strength for the list monad takes a value

x

and a list

[y_1, y_2, \ldots, y_n]

and produces the list

[(x, y_1), (x, y_2), \ldots, (x, y_n)].

Jules Hedges (Jul 02 2020 at 09:48):

Here's a different perspective, maybe might help for intuition even if you don't know what it means. Commutativity is equivalent to the fact that the Haskell programs

(a, b) \mapsto \text{do} \{ x \leftarrow a; y \leftarrow b; \text{return} (x, y) \}

and

(a, b) \mapsto \text{do} \{ y \leftarrow b; x \leftarrow a; \text{return} (x, y) \}

are equal

Jules Hedges (Jul 02 2020 at 09:49):

So it says that there is no "side channel information flow" beyond what is made explicit in the types

Jules Hedges (Jul 02 2020 at 09:49):

So an example of a noncommutative monad is state, since there is implicit information flow through the state variable

Tobias Fritz (Jul 02 2020 at 11:02):

As was already mentioned, commutative monads and (symmetric) monoidal monads are equivalent structures. And symmetric monoidal monads are easy to understand from the probability perspective: the monoidal structure is encoded in a transformation

PX \otimes PY \to P(X \otimes Y)

, which implements the formation of product distributions. The definition of commutative monad amounts to the condition that the two transformations

PX \otimes PY \to P(X \otimes Y)

, one constructed from the strength

X \otimes PY \to P(X \otimes Y)

and the other from the costrength

PX \otimes Y \to P(X \otimes Y)

, coincide. In other words, a commutative monad is one which has a well-defined concept of forming product distributions.

A probability monad should also be expected to have an oplax monoidal structure,

P(X \otimes Y) \to PX \otimes PY

, which implements the formation of marginals. We've worked out the details of how the formation of joints and marginals interact in terms of the monad in our bimonoidal structures paper. This oplax monoidal structure exists and behaves as expected for example whenever

P

lives on a cartesian monoidal category.

Sam Tenka (Jul 02 2020 at 18:52):

@Tobias Fritz Thanks for pointing out these errors! I've asked @David Spivak to help me correct them by annotating the video.

Sam Tenka (Jul 02 2020 at 18:55):

Hi @Nathaniel Virgo ! Here

a

and

\alpha

are elements of the convex space

A

, and

x

is a real number. So we don't have that

x\cdot a

is in

A

, but by abuse of notation we still write it in larger contexts such as

x\cdot a + p

x \cdot a + (1-x) \cdot \alpha

that are valid convex combinations. Hope this helps :slight_smile:

Nathaniel Virgo (Jul 03 2020 at 06:29):

Thanks everyone, this is all extremely helpful. The graphical notation in Tobias and Paolo's bimonoidal structures paper seems very helpful for thinking about this stuff, so I'm working through it. I'll probably have some more comments and questions after that.

Nathaniel Virgo (Jul 07 2020 at 01:43):

I've been working my way through all the material that was posted, and most of the points above are quite clear now, although I still feel I could understand the tensor product of algebras construction a bit better.

But while working through the bimonoidal structures paper, I came up with another question, which is only tangentially related but possibly also interesting. In the framework of that paper, the following is a function that takes a joint distribution between A and B and maps it to a distribution where A and B have the same marginals but are independent. In other words, it maps

p(A,B) \mapsto p(A)p(B)

, elementwise.

One might also want to perform the map

p(A,B,C) \mapsto p(B)p(A|B)p(C|B) = p(A,B)p(B,C)/p(B)

, that is, create a distribution where

A\times B

and

B\times C

have the same marginals, but A and C are conditionally independent given B. Is there a way to write that in this framework?

means the operation of conditioning on the two B's being equal, so it maps

p(A,B_1,B_2,C) \mapsto p(A,B_1,C)p(B_1{=}B_2)/Z,

where Z is such that the resulting distribution is normalised.

Having played around with that idea a bit, I'm not sure how easy it is to make it work consistently. But I'm wondering if conditioning in this way is something that's been thought about in the context of probability monads, and/or if there's a different way to write or think about the map

p(A,B,C) \mapsto p(B)p(A|B)p(C|B)

in this kind of framework.

Tobias Fritz (Jul 07 2020 at 10:52):

That's very interesting! But unfortunately, conditioning is notoriously problematic in categorical terms, largely because one needs to account for conditioning on events of probability zero. These problems also haunt your proposal: what do you intend your diagram to mean if

p(B_1 = B_2) = 0

? This may seem like a mere nuisance problem in the case of discrete variables, but for continuous probability it is the generic case! Two continuous variables being equal typically has probability zero.

So the semantics of your black dot

B \otimes B \to B

cannot represent conditioning on equality. It can represent the equality test, or filtering for equality, by which I mean the unnormalized distribution

p(A,B1,B2,C) \mapsto p(A,B1,C)p(B1=B2)

. Together with the copy map

B \to B \otimes B

, this equality test can be formalized as a data service in the sense of Pavlovic, as it came up at ACT yesterday in John Nolan's talk. So this makes sense as long as you're happy to work with subnormalized or arbitrarily normalized distributions. But then you still have the problem that the equality test tends to fail with probability one for continuous variables, in which case you will always obtain the zero distribution.

Perhaps a better approach is to embrace the problems with conditioning and to accept that conditioning is not an algebraic operation. This is the approach taken by Markov categories, with conditioning treated by Cho and Jacobs and in Section 11 of my paper. This way, conditioning is not an operation because conditionals are not unique, but they can still be proven to be unique up to almost sure equality, which is exactly what happens in measure-theoretic probability. Definition 12.8 of my paper and after also treat conditional products; as you may know, your

p(A,B,C) \mapsto p(B)p(A∣B)p(C∣B)=p(A,B)p(B,C)/p(B)

is a special case of these. This applies both in discrete probability and measure-theoretic probability.

Finally, perhaps it's worth mentioning hypernormalisation as another potential way of avoiding the problems with conditioning. This may be worth exploring further, and in particular (as far as I know) still needs to be extended beyond the discrete case. But it's hard to say whether this will be possible at all, and whether this approach is ultimately as expressive as the Markov categories one, for example by facilitating a definition of conditional products.

Nathaniel Virgo (Jul 24 2020 at 00:56):

I came back to this in the last couple of days, and I'm trying to get a better understanding of categorical construction of the tensor product of algebras. I've realised I have a mental block about something, which is making it very hard to understand. nlab says this:

In both cases, they say it's defined as a coequaliser in the Eilenberg-Moore category, but the diagram they give doesn't seem to be a diagram in the Eilenberg-Moore category (at least not in an obvious way). Instead it's a diagram in the original, base category.

I imagine this is one of those things like "natural in A and B," where it's fairly clear why people say it that way once you understand it, but it requires a logical leap that's hard to make if it's not spelt out the first time you see it. Is someone able to expand on the definition above a little bit, so that I can understand why it makes sense to talk about a coequaliser in

C^T

in regards to the given diagram?

Nathaniel Virgo (Jul 24 2020 at 01:05):

I guess to be more explicit, for it to be a diagram in

C^T

, we would have to see

T\kappa;\mu

and

T(a\otimes b)

as morphisms of monad algebras. I guess the types would make sense if I see them as morphisms between the free algebras

(T(TA\otimes TB),\eta)

and

(T(A\otimes B),\eta)

. But if that's the interpretation then I'm really confused about the motivation - why are we considering these particular algebras, in order to define a tensor product between

A

and

B

Paolo Perrone (Jul 24 2020 at 13:11):

As you are suggesting, it's about free algebras. Namely, for every object

X

of the base category

(TX,\mu)

is a

T

-algebra (why do the necessary diagrams commute?) and for every morphism

f:X\to Y

of the base category, the morphism

Tf:TX\to TY

gives a morphism of algebras (this one I tell you, by naturality of

\mu

). Moreover,

\mu:TTX\to TX

is also a morphism of free algebras, by the associativity diagram, and more generally, for every algebra

(A,e)

, the structure map

e:TA\to A

lives in the Eilenberg-Moore category (why?)

I hope it's clearer now. The lack of introductory material on monoidal monads and the like is indeed frustrating, especially since it's all so similar with ring theory. One day I'd like to write something myself, I cannot find anything out there.

Paolo Perrone (Jul 24 2020 at 14:43):

By the way, notice also that, while the forgetful functor

C^T\to C

preserves limits (it's a right adjoint), it generally does not preserve colimits - therefore, the coequalizer in the category of algebras may differ from the one of the base category. (And Beck's theorem can't in general be used: this coequalizer we have is not in general split by the forgetful functor.)

In particular, if the base category is Set, taking the coequalizer in the category of algebras amounts to quotienting out the congruence generated by the two parallel maps, rather than just the equivalence relation generated by the two. For the free abelian group monad, for example, this amounts to identify elements belonging to the subgroup generated by the two maps, which may be larger than just the subset generated by them.

Paolo Perrone (Jul 24 2020 at 14:48):

In any case, if you are familiar with the tensor product of vector spaces (or modules, etc), the best way to approach the tensor products of general

T

-algebras is probably via this universal property, which can be given as definition (and is equivalent to the coequalizer one as long as the necessary colimits exist).

John Baez (Jul 24 2020 at 17:07):

Durov has written a vast amount on commutative monads, as a warmup to his new approach to algebraic geometry:

Don't be put off by the title; his thesis is 568 pages long, and a long section explains commutative monads to people who don't know about monads.

Then he redoes algebraic geometry replacing commutative rings with commutative monads!

Nathaniel Virgo (Jul 25 2020 at 08:59):

Thanks @Paolo Perrone, that's helpful. So my guess about how to read the diagram was correct, but I'm still kind of stuck on the 'why'.

If we write all the algebras explicitly as pairs, the definition says that the tensor product is a morphism of algebras

(T(A\times B),\mu) \xrightarrow{q_{(A,a),(B,b)}} (A,a)\otimes (B,b)

. So in a probability context, we're forming equivalence classes of joint distributions over

A

and

B

, where the equivalence relation has to do with two different ways of passing from

T(T(A)\times T(B))

to just

T(A\times B)

I guess the question is, what does this have to do with the story about representing bihomomorphisms? If I could understand, on an intuitive level, why the thing that represents bihomomorphisms should be a quotient of

T(A\times B)

, I think I'd be a lot closer to getting it.

Reid Barton (Jul 25 2020 at 12:31):

Well giving a

T

-map out of

T(A \times B)

is the same as giving an ordinary function out of

A \times B

, and a bihomomorphism from

(A, B)

to something is an ordinary function out of

A \times B

that satisfies some extra conditions, so if we can express those conditions as equating two specific

T

-maps into

T(A \times B)

, we'd be done.

Tobias Fritz (Jul 25 2020 at 12:49):

@Nathaniel Virgo: do you know how and why every

T

-algebra

A

is a coequalizer of the two canonical morphism

TTA \to TA

? That's a similar construction, so it may be helpful to look into it first. For example in the vector space case, it amounts to saying that every vector space

V

quotient of

TV

, which is the (enormous) vector space which has

V

as a basis; and

TTV

then tells you what you need to take the quotient by, i.e. which formal linear combinations in

V

evaluate to the same element of

V

. The tensor product construction uses the same way of thinking.

Nathaniel Virgo (Jul 26 2020 at 04:40):

Thanks to both of you, those comments were helpful, and I now understand the thing I was stuck on. I'm writing it up here mostly for the sake of being able to explain it to someone else in the future, if I end up needing to.

Let

(A,a), (B,b), (C,c)

T

-algebras in

\mathscr{C}

, and let

f:A\times B\to C

be a bihomomorphism.

f

isn't a morphism in the Eilenberg-Moore category

\mathscr{C}^T

, because in general

f{;}c \ne (a\times b){;}f

. However,

Tf:T(A\times B)\to TC

can be seen as a morphism in

\mathscr{C}^T

, with type

(T(A\times B),\mu)\to (TC,\mu)

The point of the construction is to find an object

A\otimes B

\mathscr{C}^T

and a morphism

q

, such that any morphism of the form

Tf

(where

f

is a bihomomorphism) will factor uniquely as

T(A\times B) \xrightarrow{q} A\otimes B \xrightarrow{f'} TC

. The key trick that I hadn't understood is that although we can't factor

f

itself in

\mathscr{C}^T

(because

f

doesn't exist in

\mathscr{C}^T

), we can still factor

Tf

. This explains why the domain of

q

T(A\times B).

Once I'd understood that, the rest made sense.

To see where the coequaliser diagram comes from, we use the result that

f

is a bihomomorphism if and only if

T\kappa{;}\mu{;}Tf = T(a\times b){;}Tf,

which can be shown using algebraic manipulations starting from the definition of a bihomomorphism. We want

Tf

to factor though

q

whenever

f

has this property, and the coequaliser diagram is giving us the universal object for which this is the case.

The above all looks much prettier in string diagrams. At some point I'll render them all nicely and post it somewhere.

Stream: learning: questions

Topic: Markov categories and enrichment

Nathaniel Virgo (Jun 15 2020 at 04:17):

Evan Patterson (Jun 15 2020 at 04:30):

Nathaniel Virgo (Jun 15 2020 at 04:42):

Evan Patterson (Jun 15 2020 at 05:04):

Nathaniel Virgo (Jun 16 2020 at 04:14):

Arthur Parzygnat (Jun 16 2020 at 08:16):

Nathaniel Virgo (Jun 16 2020 at 08:46):

Evan Patterson (Jun 16 2020 at 18:18):

Evan Patterson (Jun 16 2020 at 18:26):

Sam Tenka (Jun 17 2020 at 00:17):

Sam Tenka (Jun 17 2020 at 01:02):

Evan Patterson (Jun 17 2020 at 02:01):

Sam Tenka (Jun 17 2020 at 02:54):

Nathaniel Virgo (Jul 01 2020 at 08:41):

Nathaniel Virgo (Jul 01 2020 at 08:55):

Tobias Fritz (Jul 01 2020 at 10:02):

Tobias Fritz (Jul 01 2020 at 11:08):

Tobias Fritz (Jul 01 2020 at 11:28):

Tobias Fritz (Jul 01 2020 at 11:53):

Nathaniel Virgo (Jul 01 2020 at 13:40):

Jules Hedges (Jul 01 2020 at 13:51):

Tobias Fritz (Jul 01 2020 at 14:10):

Tobias Fritz (Jul 01 2020 at 14:27):

Reid Barton (Jul 01 2020 at 14:29):

Tobias Fritz (Jul 01 2020 at 14:33):

Reid Barton (Jul 01 2020 at 14:39):

Reid Barton (Jul 01 2020 at 14:40):

Reid Barton (Jul 01 2020 at 14:40):

Tobias Fritz (Jul 01 2020 at 14:42):

Paolo Perrone (Jul 01 2020 at 15:47):

eric brunner (Jul 01 2020 at 16:03):

Nathaniel Virgo (Jul 02 2020 at 01:32):

Nathaniel Virgo (Jul 02 2020 at 02:03):

Mike Stay (Jul 02 2020 at 04:52):

Mike Stay (Jul 02 2020 at 05:16):

Jules Hedges (Jul 02 2020 at 09:48):

Jules Hedges (Jul 02 2020 at 09:49):

Jules Hedges (Jul 02 2020 at 09:49):

Tobias Fritz (Jul 02 2020 at 11:02):

Sam Tenka (Jul 02 2020 at 18:52):

Sam Tenka (Jul 02 2020 at 18:55):

Nathaniel Virgo (Jul 03 2020 at 06:29):

Nathaniel Virgo (Jul 07 2020 at 01:43):

Tobias Fritz (Jul 07 2020 at 10:52):

Nathaniel Virgo (Jul 24 2020 at 00:56):

Nathaniel Virgo (Jul 24 2020 at 01:05):

Paolo Perrone (Jul 24 2020 at 13:11):

Paolo Perrone (Jul 24 2020 at 14:43):

Paolo Perrone (Jul 24 2020 at 14:48):

John Baez (Jul 24 2020 at 17:07):

Nathaniel Virgo (Jul 25 2020 at 08:59):

Reid Barton (Jul 25 2020 at 12:31):

Tobias Fritz (Jul 25 2020 at 12:49):

Nathaniel Virgo (Jul 26 2020 at 04:40):