Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: categorical probability

Topic: What is normalisation?

Nathaniel Virgo (Jan 02 2021 at 03:31):

This is kind of a silly question, but I keep coming back to it, and thought maybe it's worth starting a conversation. In short the question is, if we work in a category of unnormalised Markov kernels, how can we characterise the operation of normalising an unnormalised kernel? It's not a functor in the most obvious sense, but perhaps there is some other 'nice' way to approach it category-theoretically?

I'll write only about the finite case below, but if this can be / has been resolved in a more general measure-theoretic context I'll be very happy.

A simple example of a Markov category is $\mathsf{FinStoch}$ , where objects are finite sets and a morphism $X\xrightarrow{f}Y$ is a Markov kernels, which defines a normalised probability distribution over $Y$ for each element of $X$ . However, sometimes it's helpful to work with unnormalised distributions instead - I'll call the resulting category $\mathsf{UFinStoch}$ .

$\mathsf{FinStoch}$ is a Markov category as defined by Fritz, meaning that every object has a "copy" and a "delete" operation, where delete is natural but copy isn't. $\mathsf{UFinStoch}$ is an "unnormalised Markov category", meaning that delete isn't necessarily natural either. If we take $\mathsf{UFinStoch}$ and restrict to only those morphisms $X\xrightarrow{f}Y$ for which $f;\mathrm{del}_Y = \mathrm{del}_X$ , then we get $\mathsf{FinStoch}$ .

But what if we want to take an arbitrary unnormalised kernel and then normalise it? This seems a fairly common thing to want to do - when we work in an unnormalised context, we usually want to turn it back into a normalised distribution in the end. We can define a function

$\mathcal{N}\colon \mathsf{UFinStoch}(X,Y) \to \mathsf{FinStoch}(X,Y)$

such that

$\mathcal{N}(f)(y|x) = \frac{f(y|x)}{\sum_{y'\in Y} f(y'|x)}.$

$\mathcal{N}$ is a (partial) map from morphisms in $\mathsf{UFinStoch}$ to $\mathsf{FinStoch}$ , but it isn't a functor, since in general

$\mathcal{N}(f;g) \ne \mathcal{N}(f);\mathcal{N}(g),$

even when all three maps are defined.

So the question is, given that normalisation isn't a functor in this sense, what is it? It seems to be sort of badly behaved from a category theoretic point of view, but I have the feeling this might be because I'm looking at it wrong. Is normalisation a functor in some other, different, sense? Or is there some way to view it as being defined by a universal property, or as being the result of some other "nice" category-theoretic construction?

Recently I had the idea that perhaps there should be a double category, or even an equipment, where the 'proarrows' are unnormalised Markov kernels and the other arrows are the normalised ones. This is intuitively appealing because I think of normalised kernels as stochastic functions, whereas unnormalised kernels have more of a "relation-like" feel. I couldn't get it to work though, because I don't know what the 2-cells should be.

Anyway, I'll be happy if someone else has thought about this as well and has anything they can share.

Tobias Fritz (Jan 02 2021 at 08:48):

In order to find a potential universal property of normalization, how about first finding some purely equational properties that it enjoys instead of the functoriality? What could those be?

One potentially useful piece of additional structure on UFinStoch is that the hom-sets are partially ordered, simply by comparing matrices entrywise. So could the 2-cells in your double category just be these ordering relations?

BTW how do you imagine dealing with the nuisance of zero denominators? Just exclude the offending morphisms from UFinStoch?

Nathaniel Virgo (Jan 02 2021 at 09:50):

One equational property is that every non-zero morphism in $\mathsf{UFinStoch}$ decomposes uniquely like this

image.png

where $\mathcal{N}(f)$ is normalised. That could serve as a definition of normalisation, but it's not very clear to me where to go with it after that.

I thought about having the 2-cells be that ordering relation, but unless I made a mistake (which is entirely possible - I'll double check tomorrow) it didn't seem to work. We can start with a (strict) 2-category defined that way. I hoped that normalised kernels would have right adjoints, which would allow us to extend it to a double category in a nice way, which would give an equipment. However, it seemed that the only things that have right adjoints in that 2-category are deterministic injective functions, so that didn't work. I was quite surprised by that.

There might be some other, non-equipment way to make it a double category though. We'd need some way to extend that partial order to squares like this:

image.png

About the zero denominators - I'm not sure about that. I guess I'm hoping the right way to deal them will become obvious at some point.

Tobias Fritz (Jan 02 2021 at 10:34):

[Message erased, since you already mentioned the same thing, sorry.]

Matteo Capucci (he/him) (Jan 02 2021 at 15:09):

Is there something like a morphism of monads from $U \to D$ where the first is the monad of unnormalized distributions and the second is the monad of normalized distributions?

Paolo Perrone (Jan 02 2021 at 16:17):

One thing that I've always been struggling with is defining normalization as a sort of quotient operation with respect to rescaling, analogous to projectivization in geometry. A space of (non-negative) measures is then turned into a space of probability measures (the "rays"), plus a zero measure. This is also where conditioning should take values: conditioning on events of zero measure becomes a feature and not a bug.
If anyone has any helpful ideas about going this way, I'm all ears!

Javier Prieto (Jan 04 2021 at 12:44):

That's essentially the trick behind projection-valued measures, right? You start by associating a measure to every vector in a Hilbert space and then projectivize to get the usual PVM.

Christoph Thies (Jan 15 2021 at 05:15):

May I use this thread to ask a question about normalisation for the distribution monad, or finitary Giry monad, as in Fritz and Perrone (2018) , Section 6.2 (where the monad called $P$ below is called $D$ )?

I would like to turn a finite family $I$ of samples in $Z\colon \mathsf{FinSet}$ , that is a map $\hat{a}\colon I\rightarrow Z,$ into a distribution $a\colon PZ$ . With the operation of taking convex combinations described in Fritz and Perrone (2018), I could count the number of occurrences of each $Z$ value in the image of $\hat{a}$ and form a convex sum that is weighted accordingly, i.e. normalised, to get $a\colon PZ$ . How can I describe this operation categorically?

I am asking this question in this thread because it seems to me that normalisation should 'automatise' this operation roughly like this. We can consider $\hat{a}$ as element of the product $\prod_{i\colon I}Z$ and apply the unit of the unnormalised monad $U$ to get an element of $\prod_{i\colon I}UZ$ of which we add up the factors to get an unnormalised distribution in $UZ$ . Normalisation turns this into a probability distribution in $PZ$ . Is this an application of the normalisation @Nathaniel Virgo referred to above?

Tobias Fritz (Jan 16 2021 at 06:13):

Right! This is a special case of the normalisation that @Nathaniel Virgo referred to above. And it's a particularly nice special case, because it is natural: for every $Z$ it's described by a map $Z^I \to PZ$ , and this makes the naturality square with respect to any map $Z \to Z'$ commute. The point is that there's no need to count the number of occurences of each value in the image and weigh accordingly; for $f : I \to Z$ , the associated normalized distribution is just $\frac{1}{|I|}\sum_{i \in I} \delta_{f(i)}$ , which autmatically gives the right thing even in case that some of the values coincide. And what makes it natural in $Z$ is that the denominator is fixed.

Christoph Thies (Jan 22 2021 at 06:00):

I have another question if I may. The problem seems simple so I might be overcomplicating things. Also, I am sure for the initiated these ideas are contained in Jacobs (2019). I am trying to express them so that I understand.

Let's consider the above map $Z^J\to PZ$ that normalises a set of samples as composite $Z^J\xrightarrow{R}UZ\xrightarrow{N}PZ$ . I would like a function $\underline{w}\colon Z\to \mathbb{N}$ to induce functions $w\colon UZ\to UZ$ , $\nu\colon PZ\to PZ$ , and $s\colon Z^J\to Z^K$ such that the diagram below commutes.
selection-1.png

The map $w\colon UZ\to UZ$ is in a sense diagonal and integral. For $p\colon UZ$ (i.e., $p\colon Z\to \mathbb{R}_{\geq 0}$ ) define $w(p)(z) = \underline{w}(z)p(z)$ . For $z\colon Z$ , $w$ sends $\delta_z$ to $\underline{w}(z)\delta_z$ .

The map $\nu\colon PZ\to PZ$ is defined by forgetting the normalisation, applying $w$ , and normalising with $N\colon UZ\to PZ$ .

As to the map $s$ , I'm not sure I can say this correctly, but I would like to replace $Z^J$ by a category $\mathsf{Pop_Z}$ that contains all finite populations, that is functions $Z^K$ for any $K\colon\mathsf{FinSet}$ . I imagine this category as an integer version of $UZ$ where the coefficients in the formal sums are natural numbers, and I believe it is the same as the category of multisets in Jacobs (2019). Then I would like to say that $s\colon\mathsf{Pop_Z}\rightarrow\mathsf{Pop_Z}$ is a functor induced by $\underline{w}$ such that the diagram above commutes for all $J\colon\mathsf{FinSet}$ .

I would like to define $\mathsf{Pop_Z}$ as the free symmetric monoidal category generated by the elements of $Z$ , that is the functions $\{*\}\rightarrow Z$ . The monoidal unit is the empty function from the empty set, the monoidal product of two functions $a\colon Z^J$ and $b\colon Z^K$ is the combined function on the disjoint union $J\cup K$ , $a\otimes b\colon Z^{J\cup K}$ .

Given $\underline{w}\colon Z\to\mathbb{N}$ , define $s\colon\mathsf{Pop}_Z\to\mathsf{Pop}_Z$ as symmetric monoidal functor on the generators $\{\{*\}\to z, z\colon Z\}$ by

$s(z) =\underbrace{z\otimes z\otimes \cdots\otimes z}_{\underline{w}(z) factors}.$

In the end, I would like to be able to relate functors $\mathsf{Pop}_Z\to\mathsf{Pop}_Z$ with maps $PZ\to PZ$ , so that a concrete process $s\colon Z^J\to Z^K$ in the diagram above is an instance of a 'law' $\underline{w}\colon Z\to\mathbb{N}$ . Does this make any sense?

Christoph Thies (Jan 28 2021 at 02:34):

I apologise if the following is a bit messy. I aim to be precise but am not certain at places. The connection with normalisation appears towards the end.

From the above, it seems not too far fetched to define a symmetric monoidal category of metapopulations $\mathsf{Pop}^2_Z = \mathsf{Pop}_{\mathsf{Pop}_Z}$ as free symmetric monoidal category generated by populations of elements of $Z$ , that is instances of $\mathsf{Pop}_Z$ .

There is a symmetric monoidal functor $\mu\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}_Z$ given on generators $\overline{z} = z_1\otimes\cdots\otimes z_n\colon\mathsf{Pop}^2_Z$ by $\mu (\overline{z}) = z_1\otimes\cdots\otimes z_n\colon\mathsf{Pop}_Z$ , as in the picture below.

Addition.jpg

Given a law $\underline{w}\colon Z\to\mathbb{N}$ with associated functor $s\colon\mathsf{Pop}_Z\to\mathsf{Pop}_Z, s(z) = \underline{w}(z)z\colon\mathsf{Pop}_Z$ , we can define a functor $\mathsf{Pop}_s\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z$ on generators $\overline{z}\colon\mathsf{Pop}_Z$ by

$\mathsf{Pop}_s (\overline{z}) = s(\overline{z})\colon\mathsf{Pop}^2_Z$

that satisfies

$\mu\circ\mathsf{Pop}_s = s\circ\mu\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}_Z,$

that is the induced transformation ignores the metapopulation structure.

On the other hand, we may again consider 'laws' $\overline{w}\colon\mathsf{Pop}_Z\to\mathbb{N}$ , this time for populations rather than $Z\to\mathbb{N}$ for samples, that give rise to functors $s\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z$ that send a population $\overline{z}\colon\mathsf{Pop}_Z$ to a metapopulation of $\overline{w}(\overline{z})$ copies of itself,

$s(\overline{z}) = \overline{w}(\overline{z})\overline{z}\colon\mathsf{Pop}^2_Z.$

Finally, another way of applying a law $\underline{w}\colon Z\to\mathbb{N}$ in a metapopulation is to apply it internal to the populations while keeping the 'size', or magnitude, of each population fixed. The magnitude of a population is the symmetric monoidal functor $|\cdot|\colon\mathsf{Pop}_Z\to\mathbb{N}$ given on generators by

$|z|=1\colon\mathbb{N}, z\colon Z$

(I think the word 'magnitude' for the above map is consistent with the definition in Tom Leinster, 'Entropy and Diversity', as the discrete case where points are either different or the same, e.g., in Example 6.4.6. iii. on p. 213). Unfortunately, since transformations $s\colon\mathsf{Pop}_Z\to\mathsf{Pop}_Z$ generally don't preserve magnitude, populations are replaced by distributions and I don't know how to draw them analogous to the integral instances in the picture above. But I think with

$M\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z, M(\overline{z}) = |\overline{z}|\overline{z}\colon\mathsf{Pop}^2_Z \text{ for }\overline{z}\colon\mathsf{Pop}_Z,$

the map that applies the transformation $\underline{w}\colon Z\to\mathbb{N}$ internal to the populations while keeping their size relative to each other constant could be written like this

$\mathsf{Pop}^2_Z\xrightarrow{M}\mathsf{Pop}^2_Z\xrightarrow{\mathsf{Pop}_s}\mathsf{Pop}^2_Z\xrightarrow{\mathsf{Pop}_N}\mathsf{Pop}_{PZ}$

with normalisation $N\colon\mathsf{Pop}_Z\to PZ$ as further above. Here the magnitude of a population is translated into its multiplicity such that after normalisation is applied to each of its transformed copies separately, the magnitude is unchanged while elements in $\mathsf{Pop}_Z$ have been exchanged to units of a different kind, namely $PZ$ .

Do these constructions seem roughly sensible? I am aware that the above can be described in terms of more general structures and constructions, but I am trying to understand from a specific viewpoint that I am happy to explain.

Nathaniel Virgo (Jan 28 2021 at 04:05):

What's a 'law' in this context? (And am I guessing correctly that "population" means the same thing as "multiset" or "free commutative monoid"?)

Christoph Thies (Jan 28 2021 at 04:33):

Nathaniel Virgo said:

And am I guessing correctly that "population" means the same thing as "multiset" or "free commutative monoid"?

Yes, a population is a multiset or an element of the commutative monoid generated freely by the elements of $Z$ . A metapopulation is an element of the commutative monoid generated freely by populations of elements of $Z$ .

A law (the term has no meaning to me other than that a law determines the outcome of a class of processes) $Z\to\mathbb{N}$ or $\mathsf{Pop}_Z\to\mathbb{N}$ determines by how many copies of itself an individual $z\colon Z$ or a population $\overline{z}\colon\mathsf{Pop}_Z$ is replaced during the associated process $\mathsf{Pop}_Z\to\mathsf{Pop}_Z$ or $\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z$ . These laws are to represent absolute fitness, or number of offspring, of an individual or population when heredity is perfect, that is when offspring is identical to the parent. More importantly, $Z\to\mathbb{N}$ determines the corresponding process $\mathsf{Pop}_Z\to\mathsf{Pop}_Z$ individually for elements of $Z$ , and likewise $\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z$ is determined individually for populations of elements of $Z$ .

Nathaniel Virgo (Jan 28 2021 at 04:56):

Right, I got it, so this getting somewhere towards the Price equation from population genetics. I've also wondered a bit about how normalisation relates to that, because you can think about an unnormalised kernel between countable sets as being like a Markov kernel associated with a fitness "law" in this sense. (Except that it takes values in $\mathbb{R}_{\ge 0}$ rather than $\mathbb{N}$ ). In that context a normalised kernel is one where every individual has exactly one offspring, so in some sense "selection" is what makes a difference between normalised kernels and unnormalised ones.

I don't have a specific comment on your constructions currently, except to say I think it's an interesting direction.

Christoph Thies (Jan 28 2021 at 05:18):

Thank you, @Nathaniel Virgo!

Nathaniel Virgo said:

you can think about an unnormalised kernel between countable sets as being like a Markov kernel associated with a fitness "law" in this sense. (Except that it takes values in $\mathbb{R}_{\ge 0}$ rather than $\mathbb{N}$ ).

Yes, laws $Z\to\mathbb{N}$ correspond to 'diagonal' kernels from the parent to the offspring population. I chose $\mathbb{N}$ rather than $\mathbb{R}_{\geq 0}$ because it connects with the concept of reproduction of biological individuals, such as giraffes, in a classical sense. For populations, $\mathsf{Pop}_Z\to\mathbb{N}$ should be replaced by $\mathsf{Pop}_Z\to\mathbb{R}_{\geq 0}$ since absolute fitness for populations may be usefully imagined as expressing persistence, that is probability of future existence (in specified circumstances), rather than number of offspring.

Nathaniel Virgo (Jan 28 2021 at 05:25):

I agree, both are relevant for biology. I guess the most general model of population dynamics would be a stochastic map from (integer-valued) populations to integer-valued populations. Then the expected number of offspring is in $\mathbb{R}$, but the distribution can be important too. There's some stuff about that in this paper from a Price equation related perspective.

Christoph Thies (Jan 28 2021 at 05:32):

Nathaniel Virgo said:

Then the expected number of offspring is in $\mathbb{R}$ , but the distribution can be important too.

Yes, it may be important. However, I assume deterministic fitness. I think the kernels appear because the maps $\mathsf{Pop}_Z\to\mathsf{Pop}_Z$ are induced by Kleisli morphisms $Z\to\mathsf{Pop}_Z$ .

Nathaniel Virgo (Jan 28 2021 at 05:37):

Sorry, I was just chatting about possible other things to consider - I realise there's a lot still to be said even if you don't consider the stochastic element at all.