Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: categorical probability

Topic: Gaussian mixtures

Drew Allen McNeely (May 20 2022 at 05:57):

Hey everyone, @Dario Stein and I were emailing about his paper, but I figured I should post my questions here in case anyone is interested in chiming in:

There's a remark in @Tobias Fritz's paper that "Gauss has the flavors of a Kleisli category," but that no one can seem to find the associated monad and he conjectures that there isn't one.
I've been frustrated by that fact for a while, but your decorated maps construction here seems to scratch that itch. I'm wondering if we can extend it on to, eg. truncated Gaussians on Riemannian manifolds and other more exotic spaces?

Also, I've been curious for a long time about constructing Gaussian mixture models as a composition of types: in particular the weighted list functor on the Gaussian type. Tobias suggested I try looking into distributive laws. It was easy to find the symmetric monoidal compatibility morphisms for composed monads using a distributive law, but again Gauss had the issue of not arising from a monad as a Kleisli category. Further, a distributive law doesn't quite make sense here, as a convex combination of Gaussians makes sense, but a Gaussian distribution of weighted lists does not. Do you think instead there might be a way to push decorated maps through some functor, and construct everything required to make a Markov category in a way that resolves to Gaussian mixtures in a special case?

Tobias Fritz (May 20 2022 at 16:58):

So here's a conjectural construction which would let us construct a Markov category of Gaussian mixture models, but at the same time would apply way more generally. Before I state it, let me motivate it a bit and provide some intuition. For a Markov category of Gaussian mixture model, we presumably want the morphisms to be probability distributions over morphisms in $\mathsf{Gauss}$ . So is there a general construction that lets us start with a Markov category $\mathsf{C}$ and construct a new Markov category whose morphisms are probability distributions over morphisms of $\mathsf{C}$ ?

Here's how I think this works. Let $\mathsf{C}$ be any [[commutative monad]] on $\mathsf{Set}$ (or whatever category $\mathsf{C}$ is enriched in), and suppose that $T$ is affine, which means $T1 \cong 1$ . Now define a new category $T\mathsf{C}$ with the same objects as $\mathsf{C}$ , but where $(T\mathsf{C})(X,Y) := T(\mathsf{C}(X,Y))$ . For composition, we should be able to use
$T(\mathsf{C}(X,Y)) \times T(\mathsf{C}(Y,Z)) \longrightarrow T(\mathsf{C}(X,Y) \times \mathsf{C}(Y,Z)) \to T(\mathsf{C}(X,Z)),$
where the first morphism is the laxator of $T$ and the second one induced by composition in $\mathsf{C}$ . The monoidal structure should be defined similarly, and $T\mathsf{C}$ should also inerhit the Markov structure from $\mathsf{C}$ .

Conjecture: This construction makes sense and makes $T\mathsf{C}$ into a Markov category.

Would you want to think about it? If we now take $T$ to be the usual distribution monad and take $\mathsf{C}$ to be $\mathsf{Gauss}$ , then we should get a Markov category for Gaussian mixture models.

Sam Staton (May 20 2022 at 17:03):

Interesting! Presumably C needs to be an affine monad? or did I miss a trick?

Tobias Fritz (May 20 2022 at 17:12):

Of course, thanks! I've fixed that above.

Sam Staton (May 20 2022 at 17:48):

Great. Might it still be related to a distributive law? Iirc any Markov category $\mathsf C$ fully embeds in $Kl(G)$ for $G$ a monad on $[{\mathsf C}_{det}^{op},Set]$ . And your monad $T$ on $Set$ can be regarded pointwise as a monad on $[{\mathsf C}_{det}^{op},Set]$ . And maybe you’ve essentially given a canonical distributive law of $G$ over $T$ , so that your category $T\mathsf C$ fully embeds in $Kl(TG)$ .

Tobias Fritz (May 20 2022 at 18:34):

Nice! I don't think I have a good enough understanding of how that embedding works to think about whether that hypothesis is plausible. In other words, I'm being too dense :upside_down:

Sam Staton (May 20 2022 at 18:47):

I'll take "dense" as a pun :smiley: . I just meant that using the Yoneda lemma, and writing $y$ for the Yoneda embedding, and $\hat T$ for the monad on $[{\mathsf C}_{det}^{op},Set]$ that is $T$ pointwise,

$Kl(\hat TG)(yX,yY) = [{\mathsf C}_{det}^{op},Set](yX,\hat T(G(yY))) = \hat T(G(yY))(X) = T(G(yY)(X)) = T(Kl(G)(yX,yY)) = T(\mathsf{C}(X,Y))$ .

Drew Allen McNeely (May 25 2022 at 21:21):

Tobias Fritz said:

For a Markov category of Gaussian mixture model, we presumably want the morphisms to be probability distributions over morphisms in $\mathsf{Gauss}$ .

Yes, indeed! That's how I tried constructing it anyways, but I didn't think to work with the hom-objects. I was still trying to work with morphisms in $\mathsf{Gauss}$ as pseudo-Kleisli maps in $\mathsf{Set}$ , and had no idea how to represent distributions of morphisms.

I should remark, in my field Gaussian mixtures are not represented this way, but thinking about it, I think they're just a special case of your construction. It's typically only the distribution that's modeled as a convex combination of Gaussians, and they're pushed through a deterministic nonlinear model (which is either linearized about the mean of each Gaussian component, or it's approximated using the unscented transform). But the distribution would just be an element of $\mathsf{C}(1,X)$ , which gets composed with a Dirac distribution of a deterministic morphism in $\mathsf{C}$ .

I might restate everything just for my own understanding:

Let $\mathsf{C}$ be any [[commutative monad]] on $\mathsf{Set}$

Did you and mean that $T$ is a commutative monad? I'm having a hard time understanding how $\mathsf{C}$ or its hom-functor can be treated as a monad, unless I'm missing something? Also I think $T$ needs the monad NTs for the following:

I'm just now reading about commutative monads. By laxator, do you mean the compositions $\mu \circ T\tau \circ \sigma = \mu \circ T\sigma \circ \tau$ in the commutative diagram for commutative monads? I'm assuming that for the finite distribution monad, this would take $\left(\{w_i, f_i\}_i\ , \{v_j, g_j\}_j\right) \mapsto \{w_i v_j\ , (f_i, g_j)\}_{ij}$ ie. would give you all possible combinations of compositions/bimaps? (Edit: wait, this is literally just the $\nabla$ compatibility morphism from page 17 of your synthetic paper)

For the Markov structure, do we just use Dirac distributions of the Markov morphisms from $\mathsf{C}$ ? Ie. $\mathrm{copy}^{T\mathsf{C}}_X = \eta^T_{\mathsf{C}(X,X\otimes X)}(\mathrm{copy}^{\mathsf{C}}_X)$ and so forth?

Where does the affineness of $T$ come in? I think in general, all the little details of figuring out which axioms/properties are needed for what tend to get lost on me. I'm sure your gs-categories paper will help with that, so I should try to go back and read it again :sweat_smile:

The final question, which I'm sure is more difficult, is whether or not we can bring conditioning into this. I might try to chew on it! Thanks for the boost though :)

Tobias Fritz (May 26 2022 at 13:12):

Hi! Great to see this going!

Drew Allen McNeely said:

I should remark, in my field Gaussian mixtures are not represented this way, but thinking about it, I think they're just a special case of your construction. It's typically only the distribution that's modeled as a convex combination of Gaussians, and they're pushed through a deterministic nonlinear model (which is either linearized about the mean of each Gaussian component, or it's approximated using the unscented transform). But the distribution would just be an element of $\mathsf{C}(1,X)$ , which gets composed with a Dirac distribution of a deterministic morphism in $\mathsf{C}$ .

Sounds good! I'm not sure why it "gets composed with a Dirac distribution of a deterministic morphism in $\mathsf{C}$ ". Are you referring to a random variable here, in the sense of a deterministic function on a probability space?

BTW, I suspect that the construction isn't quite what one might want yet. Assuming that it works, I think that it works well as far as morphisms $1 \to X$ are concerned. But a general morphism $A \to X$ contains a bit more data than we may want. For example, if we apply the construction to $\mathsf{FinSet}$ instead and aim to construct $\mathsf{FinStoch}$ like that, then we get two different morphisms $2 \to 2$ which coincide in $\mathsf{FinStoch}$ , namely one which is the uniform mixture of the identity and the swap, and the other being the uniform mixture of the two constant functions. So we still need to form a certain quotient in order to arrive at $\mathsf{FinStoch}$ .

I might restate everything just for my own understanding:

Let $\mathsf{C}$ be any [[commutative monad]] on $\mathsf{Set}$
Did you and mean that $T$ is a commutative monad?

Whoops :hushed: Yes, that's what I meant!

Also I think $T$ needs the monad NTs for the following:

What is a monad NT?

By laxator, do you mean the compositions $\mu \circ T\tau \circ \sigma = \mu \circ T\sigma \circ \tau$ in the commutative diagram for commutative monads?

I'm not quite sure what $\tau$ and $\sigma$ are, but yes, that's probably what I mean. In more down-to-earth terms, I mean the natural transformation with components $TX \times TY \longrightarrow T(X \times Y)$ , which intuitively corresponds to the formation of product measures. If you haven't already seen it, Appendix C in here may be one place to read about this in the probability context.

I'm assuming that for the finite distribution monad, this would take $\left(\{w_i, f_i\}_i\ , \{v_j, g_j\}_j\right) \mapsto \{w_i v_j\ , (f_i, g_j)\}_{ij}$ ie. would give you all possible combinations of compositions/bimaps? (Edit: wait, this is literally just the $\nabla$ compatibility morphism from page 17 of your synthetic paper)

Exactly :100:

For the Markov structure, do we just use Dirac distributions of the Markov morphisms from $\mathsf{C}$ ? Ie. $\mathrm{copy}^{T\mathsf{C}}_X = \eta^T_{\mathsf{C}(X,X\otimes X)}(\mathrm{copy}^{\mathsf{C}}_X)$ and so forth?

Yes, that's what I'm hoping will work.

Where does the affineness of $T$ come in?

That is relevant (at least) for making sure that the new category $T\mathsf{C}$ will again be Markov, i.e. that $1$ will again be terminal: we want $T\mathsf{C}(X,1) \cong T1 \cong 1$ . Without that, maybe everything else still works in the sense that $T\mathsf{C}$ will plausibly be a gs-monoidal category (=CD category). Do both of these things seem reasonable?

I think in general, all the little details of figuring out which axioms/properties are needed for what tend to get lost on me.

Make sure to write things up, yes? Even if only for yourself and by hand, and I'm sure it will work, possibly with some persistence.

I'm sure your gs-categories paper will help with that

I don't exactly see the relation to free gs-monoidal categories or free Markov categories. What do you think it has to do exactly?

The final question, which I'm sure is more difficult, is whether or not we can bring conditioning into this.

Yes, that indeed sounds interesting. Let's try to take it one step at a time!

Drew Allen McNeely (Jun 02 2022 at 23:15):

Sounds good! I'm not sure why it "gets composed with a Dirac distribution of a deterministic morphism in $\mathsf{C}$ ". Are you referring to a random variable here, in the sense of a deterministic function on a probability space?

Not a random variable, but rather that most papers I've read describe the state-evolution model as some deterministic function $f\in \mathsf{C}(X,X)$ , and a state $x_k$ is a Gaussian mixture that gets pushed through $f$ . I was just saying how I had realized that this can be easily set up in the $T\mathsf{C}$ construction: The mixture state is treated as a morphism $T\mathsf{C} \ni x_k: 1\rightarrow X$ which gets composed with $\eta^T_{\mathsf{C}(X,X)}(f) \in T\mathsf{C}$ . Now I'm actually simplifying this: the primary reason Gaussian mixtures are used in my field are for approximation. $f$ is typically nonlinear, and gets linearized about the mean of each Gaussian mixture component. Then each component is pushed through its respective linearization of $f$ . This may be a subject for another thread though.

BTW, I suspect that the construction isn't quite what one might want yet.

I'm gonna chew on this a bit more :)

What is a monad NT?

Oh I was just referring to the monad natural transformations $\mu$ and $\eta$ for $T$ :blush:

I'm not quite sure what $\tau$ and $\sigma$ are

Here I was referring to the left- and right-strengths $\sigma_{X,Y}:X\otimes TY \rightarrow T(X\otimes Y)$ and $\tau_{X,Y}:TX\otimes Y \rightarrow T(X\otimes Y)$ as seen in the commutative diagram for the [[commutative monad]] article you linked to. I'll check out that appendix though!

I don't exactly see the relation to free gs-monoidal categories or free Markov categories. What do you think it has to do exactly?

Oh I wasn't actually referring to free constructions, more that you talk about gs-monoidal categories in that paper and maybe it will help me understand what implications come up when we remove the assumption that $1$ is terminal. It's all those types of little details, including for all those string diagram identities like $\mathrm{swap}\circ \mathrm{copy} = \mathrm{copy}$ that make me ask, why are these there and what happens if we remove them?

Tobias Fritz (Jun 03 2022 at 07:28):

Got it. Then I'm on board with everything. I don't have a complete understanding of words like "state-evolution model", but what you say makes sense to me, so I think I'm following. And I also agree that the existence of conditionals in that category will be an interesting question.

Drew Allen McNeely (Aug 04 2023 at 20:21):

@Dario Stein for our reference