Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: categorical probability

Topic: Gaussian mixtures


view this post on Zulip Drew Allen McNeely (May 20 2022 at 05:57):

Hey everyone, @Dario Stein and I were emailing about his paper, but I figured I should post my questions here in case anyone is interested in chiming in:

There's a remark in @Tobias Fritz's paper that "Gauss has the flavors of a Kleisli category," but that no one can seem to find the associated monad and he conjectures that there isn't one.
I've been frustrated by that fact for a while, but your decorated maps construction here seems to scratch that itch. I'm wondering if we can extend it on to, eg. truncated Gaussians on Riemannian manifolds and other more exotic spaces?

Also, I've been curious for a long time about constructing Gaussian mixture models as a composition of types: in particular the weighted list functor on the Gaussian type. Tobias suggested I try looking into distributive laws. It was easy to find the symmetric monoidal compatibility morphisms for composed monads using a distributive law, but again Gauss had the issue of not arising from a monad as a Kleisli category. Further, a distributive law doesn't quite make sense here, as a convex combination of Gaussians makes sense, but a Gaussian distribution of weighted lists does not. Do you think instead there might be a way to push decorated maps through some functor, and construct everything required to make a Markov category in a way that resolves to Gaussian mixtures in a special case?

view this post on Zulip Tobias Fritz (May 20 2022 at 16:58):

So here's a conjectural construction which would let us construct a Markov category of Gaussian mixture models, but at the same time would apply way more generally. Before I state it, let me motivate it a bit and provide some intuition. For a Markov category of Gaussian mixture model, we presumably want the morphisms to be probability distributions over morphisms in Gauss\mathsf{Gauss}. So is there a general construction that lets us start with a Markov category C\mathsf{C} and construct a new Markov category whose morphisms are probability distributions over morphisms of C\mathsf{C}?

Here's how I think this works. Let C\mathsf{C} be any [[commutative monad]] on Set\mathsf{Set} (or whatever category C\mathsf{C} is enriched in), and suppose that TT is affine, which means T11T1 \cong 1. Now define a new category TCT\mathsf{C} with the same objects as C\mathsf{C}, but where (TC)(X,Y):=T(C(X,Y))(T\mathsf{C})(X,Y) := T(\mathsf{C}(X,Y)). For composition, we should be able to use
T(C(X,Y))×T(C(Y,Z))T(C(X,Y)×C(Y,Z))T(C(X,Z)),T(\mathsf{C}(X,Y)) \times T(\mathsf{C}(Y,Z)) \longrightarrow T(\mathsf{C}(X,Y) \times \mathsf{C}(Y,Z)) \to T(\mathsf{C}(X,Z)),
where the first morphism is the laxator of TT and the second one induced by composition in C\mathsf{C}. The monoidal structure should be defined similarly, and TCT\mathsf{C} should also inerhit the Markov structure from C\mathsf{C}.

Conjecture: This construction makes sense and makes TCT\mathsf{C} into a Markov category.

Would you want to think about it? If we now take TT to be the usual distribution monad and take C\mathsf{C} to be Gauss\mathsf{Gauss}, then we should get a Markov category for Gaussian mixture models.

view this post on Zulip Sam Staton (May 20 2022 at 17:03):

Interesting! Presumably C needs to be an affine monad? or did I miss a trick?

view this post on Zulip Tobias Fritz (May 20 2022 at 17:12):

Of course, thanks! I've fixed that above.

view this post on Zulip Sam Staton (May 20 2022 at 17:48):

Great. Might it still be related to a distributive law? Iirc any Markov category C\mathsf C fully embeds in Kl(G)Kl(G) for GG a monad on [Cdetop,Set][{\mathsf C}_{det}^{op},Set]. And your monad TT on SetSet can be regarded pointwise as a monad on [Cdetop,Set][{\mathsf C}_{det}^{op},Set]. And maybe you’ve essentially given a canonical distributive law of GG over TT, so that your category TCT\mathsf C fully embeds in Kl(TG)Kl(TG).

view this post on Zulip Tobias Fritz (May 20 2022 at 18:34):

Nice! I don't think I have a good enough understanding of how that embedding works to think about whether that hypothesis is plausible. In other words, I'm being too dense :upside_down:

view this post on Zulip Sam Staton (May 20 2022 at 18:47):

I'll take "dense" as a pun :smiley: . I just meant that using the Yoneda lemma, and writing yy for the Yoneda embedding, and T^\hat T for the monad on [Cdetop,Set][{\mathsf C}_{det}^{op},Set] that is TT pointwise,

Kl(T^G)(yX,yY)=[Cdetop,Set](yX,T^(G(yY)))=T^(G(yY))(X)=T(G(yY)(X))=T(Kl(G)(yX,yY))=T(C(X,Y))Kl(\hat TG)(yX,yY) = [{\mathsf C}_{det}^{op},Set](yX,\hat T(G(yY))) = \hat T(G(yY))(X) = T(G(yY)(X)) = T(Kl(G)(yX,yY)) = T(\mathsf{C}(X,Y)).

view this post on Zulip Drew Allen McNeely (May 25 2022 at 21:21):

Tobias Fritz said:

For a Markov category of Gaussian mixture model, we presumably want the morphisms to be probability distributions over morphisms in Gauss\mathsf{Gauss}.

Yes, indeed! That's how I tried constructing it anyways, but I didn't think to work with the hom-objects. I was still trying to work with morphisms in Gauss\mathsf{Gauss} as pseudo-Kleisli maps in Set\mathsf{Set}, and had no idea how to represent distributions of morphisms.

I should remark, in my field Gaussian mixtures are not represented this way, but thinking about it, I think they're just a special case of your construction. It's typically only the distribution that's modeled as a convex combination of Gaussians, and they're pushed through a deterministic nonlinear model (which is either linearized about the mean of each Gaussian component, or it's approximated using the unscented transform). But the distribution would just be an element of C(1,X)\mathsf{C}(1,X), which gets composed with a Dirac distribution of a deterministic morphism in C\mathsf{C}.

I might restate everything just for my own understanding:

Let C\mathsf{C} be any [[commutative monad]] on Set\mathsf{Set}

Did you and mean that TT is a commutative monad? I'm having a hard time understanding how C\mathsf{C} or its hom-functor can be treated as a monad, unless I'm missing something? Also I think TT needs the monad NTs for the following:

I'm just now reading about commutative monads. By laxator, do you mean the compositions μTτσ=μTστ\mu \circ T\tau \circ \sigma = \mu \circ T\sigma \circ \tau in the commutative diagram for commutative monads? I'm assuming that for the finite distribution monad, this would take ({wi,fi}i ,{vj,gj}j){wivj ,(fi,gj)}ij\left(\{w_i, f_i\}_i\ , \{v_j, g_j\}_j\right) \mapsto \{w_i v_j\ , (f_i, g_j)\}_{ij} ie. would give you all possible combinations of compositions/bimaps? (Edit: wait, this is literally just the \nabla compatibility morphism from page 17 of your synthetic paper)

For the Markov structure, do we just use Dirac distributions of the Markov morphisms from C\mathsf{C}? Ie. copyXTC=ηC(X,XX)T(copyXC)\mathrm{copy}^{T\mathsf{C}}_X = \eta^T_{\mathsf{C}(X,X\otimes X)}(\mathrm{copy}^{\mathsf{C}}_X) and so forth?

Where does the affineness of TT come in? I think in general, all the little details of figuring out which axioms/properties are needed for what tend to get lost on me. I'm sure your gs-categories paper will help with that, so I should try to go back and read it again :sweat_smile:

The final question, which I'm sure is more difficult, is whether or not we can bring conditioning into this. I might try to chew on it! Thanks for the boost though :)

view this post on Zulip Tobias Fritz (May 26 2022 at 13:12):

Hi! Great to see this going!

Drew Allen McNeely said:

I should remark, in my field Gaussian mixtures are not represented this way, but thinking about it, I think they're just a special case of your construction. It's typically only the distribution that's modeled as a convex combination of Gaussians, and they're pushed through a deterministic nonlinear model (which is either linearized about the mean of each Gaussian component, or it's approximated using the unscented transform). But the distribution would just be an element of C(1,X)\mathsf{C}(1,X), which gets composed with a Dirac distribution of a deterministic morphism in C\mathsf{C}.

Sounds good! I'm not sure why it "gets composed with a Dirac distribution of a deterministic morphism in C\mathsf{C}". Are you referring to a random variable here, in the sense of a deterministic function on a probability space?

BTW, I suspect that the construction isn't quite what one might want yet. Assuming that it works, I think that it works well as far as morphisms 1X1 \to X are concerned. But a general morphism AXA \to X contains a bit more data than we may want. For example, if we apply the construction to FinSet\mathsf{FinSet} instead and aim to construct FinStoch\mathsf{FinStoch} like that, then we get two different morphisms 222 \to 2 which coincide in FinStoch\mathsf{FinStoch}, namely one which is the uniform mixture of the identity and the swap, and the other being the uniform mixture of the two constant functions. So we still need to form a certain quotient in order to arrive at FinStoch\mathsf{FinStoch}.

I might restate everything just for my own understanding:

Let C\mathsf{C} be any [[commutative monad]] on Set\mathsf{Set}
Did you and mean that TT is a commutative monad?

Whoops :hushed: Yes, that's what I meant!

Also I think TT needs the monad NTs for the following:

What is a monad NT?

By laxator, do you mean the compositions μTτσ=μTστ\mu \circ T\tau \circ \sigma = \mu \circ T\sigma \circ \tau in the commutative diagram for commutative monads?

I'm not quite sure what τ\tau and σ\sigma are, but yes, that's probably what I mean. In more down-to-earth terms, I mean the natural transformation with components TX×TYT(X×Y)TX \times TY \longrightarrow T(X \times Y), which intuitively corresponds to the formation of product measures. If you haven't already seen it, Appendix C in here may be one place to read about this in the probability context.

I'm assuming that for the finite distribution monad, this would take ({wi,fi}i ,{vj,gj}j){wivj ,(fi,gj)}ij\left(\{w_i, f_i\}_i\ , \{v_j, g_j\}_j\right) \mapsto \{w_i v_j\ , (f_i, g_j)\}_{ij} ie. would give you all possible combinations of compositions/bimaps? (Edit: wait, this is literally just the \nabla compatibility morphism from page 17 of your synthetic paper)

Exactly :100:

For the Markov structure, do we just use Dirac distributions of the Markov morphisms from C\mathsf{C}? Ie. copyXTC=ηC(X,XX)T(copyXC)\mathrm{copy}^{T\mathsf{C}}_X = \eta^T_{\mathsf{C}(X,X\otimes X)}(\mathrm{copy}^{\mathsf{C}}_X) and so forth?

Yes, that's what I'm hoping will work.

Where does the affineness of TT come in?

That is relevant (at least) for making sure that the new category TCT\mathsf{C} will again be Markov, i.e. that 11 will again be terminal: we want TC(X,1)T11T\mathsf{C}(X,1) \cong T1 \cong 1. Without that, maybe everything else still works in the sense that TCT\mathsf{C} will plausibly be a gs-monoidal category (=CD category). Do both of these things seem reasonable?

I think in general, all the little details of figuring out which axioms/properties are needed for what tend to get lost on me.

Make sure to write things up, yes? Even if only for yourself and by hand, and I'm sure it will work, possibly with some persistence.

I'm sure your gs-categories paper will help with that

I don't exactly see the relation to free gs-monoidal categories or free Markov categories. What do you think it has to do exactly?

The final question, which I'm sure is more difficult, is whether or not we can bring conditioning into this.

Yes, that indeed sounds interesting. Let's try to take it one step at a time!

view this post on Zulip Drew Allen McNeely (Jun 02 2022 at 23:15):

Sounds good! I'm not sure why it "gets composed with a Dirac distribution of a deterministic morphism in C\mathsf{C}". Are you referring to a random variable here, in the sense of a deterministic function on a probability space?

Not a random variable, but rather that most papers I've read describe the state-evolution model as some deterministic function fC(X,X)f\in \mathsf{C}(X,X), and a state xkx_k is a Gaussian mixture that gets pushed through ff. I was just saying how I had realized that this can be easily set up in the TCT\mathsf{C} construction: The mixture state is treated as a morphism TCxk:1XT\mathsf{C} \ni x_k: 1\rightarrow X which gets composed with ηC(X,X)T(f)TC\eta^T_{\mathsf{C}(X,X)}(f) \in T\mathsf{C}. Now I'm actually simplifying this: the primary reason Gaussian mixtures are used in my field are for approximation. ff is typically nonlinear, and gets linearized about the mean of each Gaussian mixture component. Then each component is pushed through its respective linearization of ff. This may be a subject for another thread though.

BTW, I suspect that the construction isn't quite what one might want yet.

I'm gonna chew on this a bit more :)

What is a monad NT?

Oh I was just referring to the monad natural transformations μ\mu and η\eta for TT :blush:

I'm not quite sure what τ\tau and σ\sigma are

Here I was referring to the left- and right-strengths σX,Y:XTYT(XY)\sigma_{X,Y}:X\otimes TY \rightarrow T(X\otimes Y) and τX,Y:TXYT(XY)\tau_{X,Y}:TX\otimes Y \rightarrow T(X\otimes Y) as seen in the commutative diagram for the [[commutative monad]] article you linked to. I'll check out that appendix though!

I don't exactly see the relation to free gs-monoidal categories or free Markov categories. What do you think it has to do exactly?

Oh I wasn't actually referring to free constructions, more that you talk about gs-monoidal categories in that paper and maybe it will help me understand what implications come up when we remove the assumption that 11 is terminal. It's all those types of little details, including for all those string diagram identities like swapcopy=copy\mathrm{swap}\circ \mathrm{copy} = \mathrm{copy} that make me ask, why are these there and what happens if we remove them?

view this post on Zulip Tobias Fritz (Jun 03 2022 at 07:28):

Got it. Then I'm on board with everything. I don't have a complete understanding of words like "state-evolution model", but what you say makes sense to me, so I think I'm following. And I also agree that the existence of conditionals in that category will be an interesting question.

view this post on Zulip Drew Allen McNeely (Aug 04 2023 at 20:21):

@Dario Stein for our reference