Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: categorical probability

Topic: What is normalisation?


view this post on Zulip Nathaniel Virgo (Jan 02 2021 at 03:31):

This is kind of a silly question, but I keep coming back to it, and thought maybe it's worth starting a conversation. In short the question is, if we work in a category of unnormalised Markov kernels, how can we characterise the operation of normalising an unnormalised kernel? It's not a functor in the most obvious sense, but perhaps there is some other 'nice' way to approach it category-theoretically?

I'll write only about the finite case below, but if this can be / has been resolved in a more general measure-theoretic context I'll be very happy.

A simple example of a Markov category is FinStoch\mathsf{FinStoch}, where objects are finite sets and a morphism XfYX\xrightarrow{f}Y is a Markov kernels, which defines a normalised probability distribution over YY for each element of XX. However, sometimes it's helpful to work with unnormalised distributions instead - I'll call the resulting category UFinStoch\mathsf{UFinStoch}.

FinStoch\mathsf{FinStoch} is a Markov category as defined by Fritz, meaning that every object has a "copy" and a "delete" operation, where delete is natural but copy isn't. UFinStoch\mathsf{UFinStoch} is an "unnormalised Markov category", meaning that delete isn't necessarily natural either. If we take UFinStoch\mathsf{UFinStoch} and restrict to only those morphisms XfYX\xrightarrow{f}Y for which f;delY=delXf;\mathrm{del}_Y = \mathrm{del}_X, then we get FinStoch\mathsf{FinStoch}.

But what if we want to take an arbitrary unnormalised kernel and then normalise it? This seems a fairly common thing to want to do - when we work in an unnormalised context, we usually want to turn it back into a normalised distribution in the end. We can define a function

N ⁣:UFinStoch(X,Y)FinStoch(X,Y)\mathcal{N}\colon \mathsf{UFinStoch}(X,Y) \to \mathsf{FinStoch}(X,Y)

such that

N(f)(yx)=f(yx)yYf(yx).\mathcal{N}(f)(y|x) = \frac{f(y|x)}{\sum_{y'\in Y} f(y'|x)}.

N\mathcal{N} is a (partial) map from morphisms in UFinStoch\mathsf{UFinStoch} to FinStoch\mathsf{FinStoch}, but it isn't a functor, since in general

N(f;g)N(f);N(g),\mathcal{N}(f;g) \ne \mathcal{N}(f);\mathcal{N}(g),

even when all three maps are defined.

So the question is, given that normalisation isn't a functor in this sense, what is it? It seems to be sort of badly behaved from a category theoretic point of view, but I have the feeling this might be because I'm looking at it wrong. Is normalisation a functor in some other, different, sense? Or is there some way to view it as being defined by a universal property, or as being the result of some other "nice" category-theoretic construction?

Recently I had the idea that perhaps there should be a double category, or even an equipment, where the 'proarrows' are unnormalised Markov kernels and the other arrows are the normalised ones. This is intuitively appealing because I think of normalised kernels as stochastic functions, whereas unnormalised kernels have more of a "relation-like" feel. I couldn't get it to work though, because I don't know what the 2-cells should be.

Anyway, I'll be happy if someone else has thought about this as well and has anything they can share.

view this post on Zulip Tobias Fritz (Jan 02 2021 at 08:48):

In order to find a potential universal property of normalization, how about first finding some purely equational properties that it enjoys instead of the functoriality? What could those be?

One potentially useful piece of additional structure on UFinStoch is that the hom-sets are partially ordered, simply by comparing matrices entrywise. So could the 2-cells in your double category just be these ordering relations?

BTW how do you imagine dealing with the nuisance of zero denominators? Just exclude the offending morphisms from UFinStoch?

view this post on Zulip Nathaniel Virgo (Jan 02 2021 at 09:50):

One equational property is that every non-zero morphism in UFinStoch\mathsf{UFinStoch} decomposes uniquely like this

image.png

where N(f)\mathcal{N}(f) is normalised. That could serve as a definition of normalisation, but it's not very clear to me where to go with it after that.

I thought about having the 2-cells be that ordering relation, but unless I made a mistake (which is entirely possible - I'll double check tomorrow) it didn't seem to work. We can start with a (strict) 2-category defined that way. I hoped that normalised kernels would have right adjoints, which would allow us to extend it to a double category in a nice way, which would give an equipment. However, it seemed that the only things that have right adjoints in that 2-category are deterministic injective functions, so that didn't work. I was quite surprised by that.

There might be some other, non-equipment way to make it a double category though. We'd need some way to extend that partial order to squares like this:

image.png

About the zero denominators - I'm not sure about that. I guess I'm hoping the right way to deal them will become obvious at some point.

view this post on Zulip Tobias Fritz (Jan 02 2021 at 10:34):

[Message erased, since you already mentioned the same thing, sorry.]

view this post on Zulip Matteo Capucci (he/him) (Jan 02 2021 at 15:09):

Is there something like a morphism of monads from UDU \to D where the first is the monad of unnormalized distributions and the second is the monad of normalized distributions?

view this post on Zulip Paolo Perrone (Jan 02 2021 at 16:17):

One thing that I've always been struggling with is defining normalization as a sort of quotient operation with respect to rescaling, analogous to projectivization in geometry. A space of (non-negative) measures is then turned into a space of probability measures (the "rays"), plus a zero measure. This is also where conditioning should take values: conditioning on events of zero measure becomes a feature and not a bug.
If anyone has any helpful ideas about going this way, I'm all ears!

view this post on Zulip Javier Prieto (Jan 04 2021 at 12:44):

That's essentially the trick behind projection-valued measures, right? You start by associating a measure to every vector in a Hilbert space and then projectivize to get the usual PVM.

view this post on Zulip Christoph Thies (Jan 15 2021 at 05:15):

May I use this thread to ask a question about normalisation for the distribution monad, or finitary Giry monad, as in Fritz and Perrone (2018) , Section 6.2 (where the monad called PP below is called DD)?

I would like to turn a finite family II of samples in Z ⁣:FinSetZ\colon \mathsf{FinSet}, that is a map a^ ⁣:IZ,\hat{a}\colon I\rightarrow Z, into a distribution a ⁣:PZa\colon PZ. With the operation of taking convex combinations described in Fritz and Perrone (2018), I could count the number of occurrences of each ZZ value in the image of a^\hat{a} and form a convex sum that is weighted accordingly, i.e. normalised, to get a ⁣:PZa\colon PZ. How can I describe this operation categorically?

I am asking this question in this thread because it seems to me that normalisation should 'automatise' this operation roughly like this. We can consider a^\hat{a} as element of the product i ⁣:IZ\prod_{i\colon I}Z and apply the unit of the unnormalised monad UU to get an element of i ⁣:IUZ\prod_{i\colon I}UZ of which we add up the factors to get an unnormalised distribution in UZUZ. Normalisation turns this into a probability distribution in PZPZ. Is this an application of the normalisation @Nathaniel Virgo referred to above?

view this post on Zulip Tobias Fritz (Jan 16 2021 at 06:13):

Right! This is a special case of the normalisation that @Nathaniel Virgo referred to above. And it's a particularly nice special case, because it is natural: for every ZZ it's described by a map ZIPZZ^I \to PZ, and this makes the naturality square with respect to any map ZZZ \to Z' commute. The point is that there's no need to count the number of occurences of each value in the image and weigh accordingly; for f:IZf : I \to Z, the associated normalized distribution is just 1IiIδf(i)\frac{1}{|I|}\sum_{i \in I} \delta_{f(i)}, which autmatically gives the right thing even in case that some of the values coincide. And what makes it natural in ZZ is that the denominator is fixed.

view this post on Zulip Christoph Thies (Jan 22 2021 at 06:00):

I have another question if I may. The problem seems simple so I might be overcomplicating things. Also, I am sure for the initiated these ideas are contained in Jacobs (2019). I am trying to express them so that I understand.

Let's consider the above map ZJPZZ^J\to PZ that normalises a set of samples as composite ZJRUZNPZZ^J\xrightarrow{R}UZ\xrightarrow{N}PZ. I would like a function w ⁣:ZN\underline{w}\colon Z\to \mathbb{N} to induce functions w ⁣:UZUZw\colon UZ\to UZ, ν ⁣:PZPZ\nu\colon PZ\to PZ, and s ⁣:ZJZKs\colon Z^J\to Z^K such that the diagram below commutes.
selection-1.png

The map w ⁣:UZUZw\colon UZ\to UZ is in a sense diagonal and integral. For p ⁣:UZp\colon UZ (i.e., p ⁣:ZR0p\colon Z\to \mathbb{R}_{\geq 0}) define w(p)(z)=w(z)p(z)w(p)(z) = \underline{w}(z)p(z). For z ⁣:Zz\colon Z, ww sends δz\delta_z to w(z)δz\underline{w}(z)\delta_z.

The map ν ⁣:PZPZ\nu\colon PZ\to PZ is defined by forgetting the normalisation, applying ww, and normalising with N ⁣:UZPZN\colon UZ\to PZ.

As to the map ss, I'm not sure I can say this correctly, but I would like to replace ZJZ^J by a category PopZ\mathsf{Pop_Z} that contains all finite populations, that is functions ZKZ^K for any K ⁣:FinSetK\colon\mathsf{FinSet}. I imagine this category as an integer version of UZUZ where the coefficients in the formal sums are natural numbers, and I believe it is the same as the category of multisets in Jacobs (2019). Then I would like to say that s ⁣:PopZPopZs\colon\mathsf{Pop_Z}\rightarrow\mathsf{Pop_Z} is a functor induced by w\underline{w} such that the diagram above commutes for all J ⁣:FinSetJ\colon\mathsf{FinSet}.

I would like to define PopZ\mathsf{Pop_Z} as the free symmetric monoidal category generated by the elements of ZZ, that is the functions {}Z\{*\}\rightarrow Z. The monoidal unit is the empty function from the empty set, the monoidal product of two functions a ⁣:ZJa\colon Z^J and b ⁣:ZKb\colon Z^K is the combined function on the disjoint union JKJ\cup K, ab ⁣:ZJKa\otimes b\colon Z^{J\cup K}.

Given w ⁣:ZN\underline{w}\colon Z\to\mathbb{N}, define s ⁣:PopZPopZs\colon\mathsf{Pop}_Z\to\mathsf{Pop}_Z as symmetric monoidal functor on the generators {{}z,z ⁣:Z}\{\{*\}\to z, z\colon Z\} by

s(z)=zzzw(z)factors.s(z) =\underbrace{z\otimes z\otimes \cdots\otimes z}_{\underline{w}(z) factors}.

In the end, I would like to be able to relate functors PopZPopZ\mathsf{Pop}_Z\to\mathsf{Pop}_Z with maps PZPZPZ\to PZ, so that a concrete process s ⁣:ZJZKs\colon Z^J\to Z^K in the diagram above is an instance of a 'law' w ⁣:ZN\underline{w}\colon Z\to\mathbb{N}. Does this make any sense?

view this post on Zulip Christoph Thies (Jan 28 2021 at 02:34):

I apologise if the following is a bit messy. I aim to be precise but am not certain at places. The connection with normalisation appears towards the end.

From the above, it seems not too far fetched to define a symmetric monoidal category of metapopulations PopZ2=PopPopZ\mathsf{Pop}^2_Z = \mathsf{Pop}_{\mathsf{Pop}_Z} as free symmetric monoidal category generated by populations of elements of ZZ, that is instances of PopZ\mathsf{Pop}_Z.

There is a symmetric monoidal functor μ ⁣:PopZ2PopZ\mu\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}_Z given on generators z=z1zn ⁣:PopZ2\overline{z} = z_1\otimes\cdots\otimes z_n\colon\mathsf{Pop}^2_Z by μ(z)=z1zn ⁣:PopZ\mu (\overline{z}) = z_1\otimes\cdots\otimes z_n\colon\mathsf{Pop}_Z, as in the picture below.

Addition.jpg

Given a law w ⁣:ZN\underline{w}\colon Z\to\mathbb{N} with associated functor s ⁣:PopZPopZ,s(z)=w(z)z ⁣:PopZs\colon\mathsf{Pop}_Z\to\mathsf{Pop}_Z, s(z) = \underline{w}(z)z\colon\mathsf{Pop}_Z, we can define a functor Pops ⁣:PopZ2PopZ2\mathsf{Pop}_s\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z on generators z ⁣:PopZ\overline{z}\colon\mathsf{Pop}_Z by

Pops(z)=s(z) ⁣:PopZ2\mathsf{Pop}_s (\overline{z}) = s(\overline{z})\colon\mathsf{Pop}^2_Z

that satisfies

μPops=sμ ⁣:PopZ2PopZ,\mu\circ\mathsf{Pop}_s = s\circ\mu\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}_Z,

that is the induced transformation ignores the metapopulation structure.

On the other hand, we may again consider 'laws' w ⁣:PopZN\overline{w}\colon\mathsf{Pop}_Z\to\mathbb{N}, this time for populations rather than ZNZ\to\mathbb{N} for samples, that give rise to functors s ⁣:PopZ2PopZ2s\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z that send a population z ⁣:PopZ\overline{z}\colon\mathsf{Pop}_Z to a metapopulation of w(z)\overline{w}(\overline{z}) copies of itself,

s(z)=w(z)z ⁣:PopZ2.s(\overline{z}) = \overline{w}(\overline{z})\overline{z}\colon\mathsf{Pop}^2_Z.

Finally, another way of applying a law w ⁣:ZN\underline{w}\colon Z\to\mathbb{N} in a metapopulation is to apply it internal to the populations while keeping the 'size', or magnitude, of each population fixed. The magnitude of a population is the symmetric monoidal functor  ⁣:PopZN|\cdot|\colon\mathsf{Pop}_Z\to\mathbb{N} given on generators by

z=1 ⁣:N,z ⁣:Z|z|=1\colon\mathbb{N}, z\colon Z

(I think the word 'magnitude' for the above map is consistent with the definition in Tom Leinster, 'Entropy and Diversity', as the discrete case where points are either different or the same, e.g., in Example 6.4.6. iii. on p. 213). Unfortunately, since transformations s ⁣:PopZPopZs\colon\mathsf{Pop}_Z\to\mathsf{Pop}_Z generally don't preserve magnitude, populations are replaced by distributions and I don't know how to draw them analogous to the integral instances in the picture above. But I think with

M ⁣:PopZ2PopZ2,M(z)=zz ⁣:PopZ2 for z ⁣:PopZ,M\colon\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z, M(\overline{z}) = |\overline{z}|\overline{z}\colon\mathsf{Pop}^2_Z \text{ for }\overline{z}\colon\mathsf{Pop}_Z,

the map that applies the transformation w ⁣:ZN\underline{w}\colon Z\to\mathbb{N} internal to the populations while keeping their size relative to each other constant could be written like this

PopZ2MPopZ2PopsPopZ2PopNPopPZ\mathsf{Pop}^2_Z\xrightarrow{M}\mathsf{Pop}^2_Z\xrightarrow{\mathsf{Pop}_s}\mathsf{Pop}^2_Z\xrightarrow{\mathsf{Pop}_N}\mathsf{Pop}_{PZ}

with normalisation N ⁣:PopZPZN\colon\mathsf{Pop}_Z\to PZ as further above. Here the magnitude of a population is translated into its multiplicity such that after normalisation is applied to each of its transformed copies separately, the magnitude is unchanged while elements in PopZ\mathsf{Pop}_Z have been exchanged to units of a different kind, namely PZPZ.

Do these constructions seem roughly sensible? I am aware that the above can be described in terms of more general structures and constructions, but I am trying to understand from a specific viewpoint that I am happy to explain.

view this post on Zulip Nathaniel Virgo (Jan 28 2021 at 04:05):

What's a 'law' in this context? (And am I guessing correctly that "population" means the same thing as "multiset" or "free commutative monoid"?)

view this post on Zulip Christoph Thies (Jan 28 2021 at 04:33):

Nathaniel Virgo said:

And am I guessing correctly that "population" means the same thing as "multiset" or "free commutative monoid"?

Yes, a population is a multiset or an element of the commutative monoid generated freely by the elements of ZZ. A metapopulation is an element of the commutative monoid generated freely by populations of elements of ZZ.

A law (the term has no meaning to me other than that a law determines the outcome of a class of processes) ZNZ\to\mathbb{N} or PopZN\mathsf{Pop}_Z\to\mathbb{N} determines by how many copies of itself an individual z ⁣:Zz\colon Z or a population z ⁣:PopZ\overline{z}\colon\mathsf{Pop}_Z is replaced during the associated process PopZPopZ\mathsf{Pop}_Z\to\mathsf{Pop}_Z or PopZ2PopZ2\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z. These laws are to represent absolute fitness, or number of offspring, of an individual or population when heredity is perfect, that is when offspring is identical to the parent. More importantly, ZNZ\to\mathbb{N} determines the corresponding process PopZPopZ\mathsf{Pop}_Z\to\mathsf{Pop}_Z individually for elements of ZZ, and likewise PopZ2PopZ2\mathsf{Pop}^2_Z\to\mathsf{Pop}^2_Z is determined individually for populations of elements of ZZ.

view this post on Zulip Nathaniel Virgo (Jan 28 2021 at 04:56):

Right, I got it, so this getting somewhere towards the Price equation from population genetics. I've also wondered a bit about how normalisation relates to that, because you can think about an unnormalised kernel between countable sets as being like a Markov kernel associated with a fitness "law" in this sense. (Except that it takes values in R0\mathbb{R}_{\ge 0} rather than N\mathbb{N}). In that context a normalised kernel is one where every individual has exactly one offspring, so in some sense "selection" is what makes a difference between normalised kernels and unnormalised ones.

I don't have a specific comment on your constructions currently, except to say I think it's an interesting direction.

view this post on Zulip Christoph Thies (Jan 28 2021 at 05:18):

Thank you, @Nathaniel Virgo!

Nathaniel Virgo said:

you can think about an unnormalised kernel between countable sets as being like a Markov kernel associated with a fitness "law" in this sense. (Except that it takes values in R0\mathbb{R}_{\ge 0} rather than N\mathbb{N}).

Yes, laws ZNZ\to\mathbb{N} correspond to 'diagonal' kernels from the parent to the offspring population. I chose N\mathbb{N} rather than R0\mathbb{R}_{\geq 0} because it connects with the concept of reproduction of biological individuals, such as giraffes, in a classical sense. For populations, PopZN\mathsf{Pop}_Z\to\mathbb{N} should be replaced by PopZR0\mathsf{Pop}_Z\to\mathbb{R}_{\geq 0} since absolute fitness for populations may be usefully imagined as expressing persistence, that is probability of future existence (in specified circumstances), rather than number of offspring.

view this post on Zulip Nathaniel Virgo (Jan 28 2021 at 05:25):

I agree, both are relevant for biology. I guess the most general model of population dynamics would be a stochastic map from (integer-valued) populations to integer-valued populations. Then the expected number of offspring is in $\mathbb{R}$, but the distribution can be important too. There's some stuff about that in this paper from a Price equation related perspective.

view this post on Zulip Christoph Thies (Jan 28 2021 at 05:32):

Nathaniel Virgo said:

Then the expected number of offspring is in R\mathbb{R}, but the distribution can be important too.

Yes, it may be important. However, I assume deterministic fitness. I think the kernels appear because the maps PopZPopZ\mathsf{Pop}_Z\to\mathsf{Pop}_Z are induced by Kleisli morphisms ZPopZZ\to\mathsf{Pop}_Z.

view this post on Zulip Nathaniel Virgo (Jan 28 2021 at 05:37):

Sorry, I was just chatting about possible other things to consider - I realise there's a lot still to be said even if you don't consider the stochastic element at all.