You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
This is kind of a silly question, but I keep coming back to it, and thought maybe it's worth starting a conversation. In short the question is, if we work in a category of unnormalised Markov kernels, how can we characterise the operation of normalising an unnormalised kernel? It's not a functor in the most obvious sense, but perhaps there is some other 'nice' way to approach it category-theoretically?
I'll write only about the finite case below, but if this can be / has been resolved in a more general measure-theoretic context I'll be very happy.
A simple example of a Markov category is , where objects are finite sets and a morphism is a Markov kernels, which defines a normalised probability distribution over for each element of . However, sometimes it's helpful to work with unnormalised distributions instead - I'll call the resulting category .
is a Markov category as defined by Fritz, meaning that every object has a "copy" and a "delete" operation, where delete is natural but copy isn't. is an "unnormalised Markov category", meaning that delete isn't necessarily natural either. If we take and restrict to only those morphisms for which , then we get .
But what if we want to take an arbitrary unnormalised kernel and then normalise it? This seems a fairly common thing to want to do - when we work in an unnormalised context, we usually want to turn it back into a normalised distribution in the end. We can define a function
such that
is a (partial) map from morphisms in to , but it isn't a functor, since in general
even when all three maps are defined.
So the question is, given that normalisation isn't a functor in this sense, what is it? It seems to be sort of badly behaved from a category theoretic point of view, but I have the feeling this might be because I'm looking at it wrong. Is normalisation a functor in some other, different, sense? Or is there some way to view it as being defined by a universal property, or as being the result of some other "nice" category-theoretic construction?
Recently I had the idea that perhaps there should be a double category, or even an equipment, where the 'proarrows' are unnormalised Markov kernels and the other arrows are the normalised ones. This is intuitively appealing because I think of normalised kernels as stochastic functions, whereas unnormalised kernels have more of a "relation-like" feel. I couldn't get it to work though, because I don't know what the 2-cells should be.
Anyway, I'll be happy if someone else has thought about this as well and has anything they can share.
In order to find a potential universal property of normalization, how about first finding some purely equational properties that it enjoys instead of the functoriality? What could those be?
One potentially useful piece of additional structure on UFinStoch is that the hom-sets are partially ordered, simply by comparing matrices entrywise. So could the 2-cells in your double category just be these ordering relations?
BTW how do you imagine dealing with the nuisance of zero denominators? Just exclude the offending morphisms from UFinStoch?
One equational property is that every non-zero morphism in decomposes uniquely like this
where is normalised. That could serve as a definition of normalisation, but it's not very clear to me where to go with it after that.
I thought about having the 2-cells be that ordering relation, but unless I made a mistake (which is entirely possible - I'll double check tomorrow) it didn't seem to work. We can start with a (strict) 2-category defined that way. I hoped that normalised kernels would have right adjoints, which would allow us to extend it to a double category in a nice way, which would give an equipment. However, it seemed that the only things that have right adjoints in that 2-category are deterministic injective functions, so that didn't work. I was quite surprised by that.
There might be some other, non-equipment way to make it a double category though. We'd need some way to extend that partial order to squares like this:
About the zero denominators - I'm not sure about that. I guess I'm hoping the right way to deal them will become obvious at some point.
[Message erased, since you already mentioned the same thing, sorry.]
Is there something like a morphism of monads from where the first is the monad of unnormalized distributions and the second is the monad of normalized distributions?
One thing that I've always been struggling with is defining normalization as a sort of quotient operation with respect to rescaling, analogous to projectivization in geometry. A space of (non-negative) measures is then turned into a space of probability measures (the "rays"), plus a zero measure. This is also where conditioning should take values: conditioning on events of zero measure becomes a feature and not a bug.
If anyone has any helpful ideas about going this way, I'm all ears!
That's essentially the trick behind projection-valued measures, right? You start by associating a measure to every vector in a Hilbert space and then projectivize to get the usual PVM.
May I use this thread to ask a question about normalisation for the distribution monad, or finitary Giry monad, as in Fritz and Perrone (2018) , Section 6.2 (where the monad called below is called )?
I would like to turn a finite family of samples in , that is a map into a distribution . With the operation of taking convex combinations described in Fritz and Perrone (2018), I could count the number of occurrences of each value in the image of and form a convex sum that is weighted accordingly, i.e. normalised, to get . How can I describe this operation categorically?
I am asking this question in this thread because it seems to me that normalisation should 'automatise' this operation roughly like this. We can consider as element of the product and apply the unit of the unnormalised monad to get an element of of which we add up the factors to get an unnormalised distribution in . Normalisation turns this into a probability distribution in . Is this an application of the normalisation @Nathaniel Virgo referred to above?
Right! This is a special case of the normalisation that @Nathaniel Virgo referred to above. And it's a particularly nice special case, because it is natural: for every it's described by a map , and this makes the naturality square with respect to any map commute. The point is that there's no need to count the number of occurences of each value in the image and weigh accordingly; for , the associated normalized distribution is just , which autmatically gives the right thing even in case that some of the values coincide. And what makes it natural in is that the denominator is fixed.
I have another question if I may. The problem seems simple so I might be overcomplicating things. Also, I am sure for the initiated these ideas are contained in Jacobs (2019). I am trying to express them so that I understand.
Let's consider the above map that normalises a set of samples as composite . I would like a function to induce functions , , and such that the diagram below commutes.
selection-1.png
The map is in a sense diagonal and integral. For (i.e., ) define . For , sends to .
The map is defined by forgetting the normalisation, applying , and normalising with .
As to the map , I'm not sure I can say this correctly, but I would like to replace by a category that contains all finite populations, that is functions for any . I imagine this category as an integer version of where the coefficients in the formal sums are natural numbers, and I believe it is the same as the category of multisets in Jacobs (2019). Then I would like to say that is a functor induced by such that the diagram above commutes for all .
I would like to define as the free symmetric monoidal category generated by the elements of , that is the functions . The monoidal unit is the empty function from the empty set, the monoidal product of two functions and is the combined function on the disjoint union , .
Given , define as symmetric monoidal functor on the generators by
In the end, I would like to be able to relate functors with maps , so that a concrete process in the diagram above is an instance of a 'law' . Does this make any sense?
I apologise if the following is a bit messy. I aim to be precise but am not certain at places. The connection with normalisation appears towards the end.
From the above, it seems not too far fetched to define a symmetric monoidal category of metapopulations as free symmetric monoidal category generated by populations of elements of , that is instances of .
There is a symmetric monoidal functor given on generators by , as in the picture below.
Given a law with associated functor , we can define a functor on generators by
that satisfies
that is the induced transformation ignores the metapopulation structure.
On the other hand, we may again consider 'laws' , this time for populations rather than for samples, that give rise to functors that send a population to a metapopulation of copies of itself,
Finally, another way of applying a law in a metapopulation is to apply it internal to the populations while keeping the 'size', or magnitude, of each population fixed. The magnitude of a population is the symmetric monoidal functor given on generators by
(I think the word 'magnitude' for the above map is consistent with the definition in Tom Leinster, 'Entropy and Diversity', as the discrete case where points are either different or the same, e.g., in Example 6.4.6. iii. on p. 213). Unfortunately, since transformations generally don't preserve magnitude, populations are replaced by distributions and I don't know how to draw them analogous to the integral instances in the picture above. But I think with
the map that applies the transformation internal to the populations while keeping their size relative to each other constant could be written like this
with normalisation as further above. Here the magnitude of a population is translated into its multiplicity such that after normalisation is applied to each of its transformed copies separately, the magnitude is unchanged while elements in have been exchanged to units of a different kind, namely .
Do these constructions seem roughly sensible? I am aware that the above can be described in terms of more general structures and constructions, but I am trying to understand from a specific viewpoint that I am happy to explain.
What's a 'law' in this context? (And am I guessing correctly that "population" means the same thing as "multiset" or "free commutative monoid"?)
Nathaniel Virgo said:
And am I guessing correctly that "population" means the same thing as "multiset" or "free commutative monoid"?
Yes, a population is a multiset or an element of the commutative monoid generated freely by the elements of . A metapopulation is an element of the commutative monoid generated freely by populations of elements of .
A law (the term has no meaning to me other than that a law determines the outcome of a class of processes) or determines by how many copies of itself an individual or a population is replaced during the associated process or . These laws are to represent absolute fitness, or number of offspring, of an individual or population when heredity is perfect, that is when offspring is identical to the parent. More importantly, determines the corresponding process individually for elements of , and likewise is determined individually for populations of elements of .
Right, I got it, so this getting somewhere towards the Price equation from population genetics. I've also wondered a bit about how normalisation relates to that, because you can think about an unnormalised kernel between countable sets as being like a Markov kernel associated with a fitness "law" in this sense. (Except that it takes values in rather than ). In that context a normalised kernel is one where every individual has exactly one offspring, so in some sense "selection" is what makes a difference between normalised kernels and unnormalised ones.
I don't have a specific comment on your constructions currently, except to say I think it's an interesting direction.
Thank you, @Nathaniel Virgo!
Nathaniel Virgo said:
you can think about an unnormalised kernel between countable sets as being like a Markov kernel associated with a fitness "law" in this sense. (Except that it takes values in rather than ).
Yes, laws correspond to 'diagonal' kernels from the parent to the offspring population. I chose rather than because it connects with the concept of reproduction of biological individuals, such as giraffes, in a classical sense. For populations, should be replaced by since absolute fitness for populations may be usefully imagined as expressing persistence, that is probability of future existence (in specified circumstances), rather than number of offspring.
I agree, both are relevant for biology. I guess the most general model of population dynamics would be a stochastic map from (integer-valued) populations to integer-valued populations. Then the expected number of offspring is in $\mathbb{R}$, but the distribution can be important too. There's some stuff about that in this paper from a Price equation related perspective.
Nathaniel Virgo said:
Then the expected number of offspring is in , but the distribution can be important too.
Yes, it may be important. However, I assume deterministic fitness. I think the kernels appear because the maps are induced by Kleisli morphisms .
Sorry, I was just chatting about possible other things to consider - I realise there's a lot still to be said even if you don't consider the stochastic element at all.