You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Does anyone know of a theorem in categorical probability that could be regarded as a categorical version of the Radon-Nikodym Theorem? I have been wondering about this a couple of times, but a short literature search never provided a result I'd be happy with.
Prakash Panangaden gave a very nice talk at UCR where he used the theorem almost synthetically: https://categorytheory.zulipchat.com/#narrow/stream/229966-ACT.40UCR-seminar/topic/April.208th.3A.20Prakash.20Panangaden
This question has been on my mind over the past week, and I'd now like to give a partial answer from the Markov categories perspective. Naively, one might think that Markov categories are not expressive enough to talk about densities and the Radon-Nikodym theorem. I think that this is largely true, but one can sidestep the appeal to densities and the Radon-Nikodym theorem to some extent. (And there may also be a possibility to build in densities natively into the framework, but I wouldn't yet know how to do that.)
But first, why are densities important? I can see two main reasons besides the Radon-Nikodym theorem:
The opposite variance of densities is related to the fact that Bayesian inversion can be described as a dagger functor, as explained in Remark 13.10 of my Markov cats paper. So while I don't know how to formulate (let alone prove) a general Radon-Nikodym theorem for Markov categories, there is a more particular construction which works in any Markov category with conditionals. Namely if is any measurable map, a probability measure on and a probability measure on with , then we can form a new measure on given by . Here, denotes the pullback of functions by composition as above.
How can we construct this new measure using only the Markov categories structure? This is possible because that measure turns out to be given by the composition , where is a Bayesian inverse of with respect to ; showing this just requires some calculation, but I think is closely related to the construction of conditional expectations in terms of the Radon-Nikodym theorem. This measure can be shown to be well-defined, i.e. independent of the particular choice of Bayesian inverse, as soon as holds; in a Markov cat, this means by definition that -a.s. equality of morphisms out of must imply -a.s. equality. So in particular types of situations, we can get around the Radon-Nikodym theorem in a way which makes sense in any Markov category with conditionals.
This construction is a special case of letting a Radon-Nikodym derivative act on a measure by multiplication, and may thus seem a bit removed from the Radon-Nikodym theorem itself. This is true, but there still seem to be important applications of this special case. In particular, the abstract Fisher-Neyman factorization theorem (Theorem 14.5 of my paper) uses this type of construction (although this is not explained in the paper because I didn't know this at the time of writing.)
I'm not sure to what extent other applications of the Radon-Nikodym theorem can be sidestepped like this. It's hard to imagine that all of them can be. For example, one may ask whether for given measures on the same measurable space with the new measure would be similarly constructible only from the Markov category structure and conditionals only. I have a sketch of a proof that this is not possible in a generic Markov category with conditionals.
Nice. Can you say a bit more about how you interpret in a Markov category? Sorry I may have missed it in your article.
In the category of s-finite kernels, the morphisms amount to measurable functions . So I think one can easily talk about a Radon-Nikodym derivative of a measure with respect to a measure as a morphism such that . (This looks especially easy in your diagrammatic notation.) But I haven't yet tried to phrase/prove the RN theorem in this abstract categorical setting of synthetic measure theory.
Interesting! That sounds like a good reason to consider categories like Markov categories, but where instead the terminality of is dropped (and replaced by the mere existence of a non-natural unit effect; I think @Arthur Parzygnat has been working with this definition).
Either way, for can be defined to mean the following: if for any two parallel morphisms and out of , then also . To see that this is equivalent to the standard definition in the Kleisli category of the Giry monad (on all measurable spaces), just take and the indicator function of a possible null set; the other direction seems to follow most easily by using the RN theorem. In fact, this synthetic definition makes sense for any two morphisms and with the same codomain , and I believe that the semantics of the condition in the Kleisli category of the Giry monad is then similar, amounting to . Perhaps this still holds in the category of s-finite kernels? (BTW, this definition was not in my paper yet, so you can't possibly have missed it.)
Now following your idea of how densities act on measures, we can easily express the RN theorem synthetically, although I'm "almost sure" that it can't be proven to hold from just the categorical structure and the existence of conditionals. However, when expressed diagrammatically like this, the RN theorem acquires the flavour of a factorization property vaguely reminiscent of the existence of conditionals. So I wonder:
Puzzle: Do the existence of conditionals and the RN theorem have a common generalization?
If so, their common generalization may be a very natural candidate axiom for synthetic measure/probability theory. Or am I completely off track here?
Thanks @Tobias Fritz. Matthijs Vákár and Luke Ong worked out the theory of R-N derivatives and conditional probabilities for s-finite kernels here.
They give a general theorem for conditional probability that includes R-N derivatives as a special case (Theorem 13 and Remark 2). It would be great to write this in abstract categorical language. Is that the kind of thing you were thinking of in your puzzle?
But they use special forms of almost-surely and absolute continuities to deal with the possible infinities. Maybe this is your categorical formulation immediately translated to the category of s-finite kernels, but I am not yet sure.
@Sam Staton Wow! Yes, that indeed looks like a very nice answer to my puzzle.
Just to make sure I understand: in the extra condition just after the footnote 7 of Ong and Vákár, I assume that should be , right? I've been trying to see how this condition could possibly be implied by my synthetic definition of absolute continuity proposed above, when applied to the category of s-finite kernels. What I seem to be getting (with and for simplicity) is that is equivalent to: for all measurable and , we have that implies . Hence I do not know how to state the generalized disintegration theorem Of Ong and Vákár synthetically. Perhaps it's possible upon tweaking the definitions on either side a bit more.
That Vákár and Ong paper looks very nice indeed!
Does anyone know for a fact whether the category of s-finite kernels is (an unnormalized) Markov category in the sense of Tobias' paper?
Tobias Fritz said:
is equivalent to: for all measurable and , we have that implies .
I have no idea what you are referring to here to be honest, would you mind elaborating?
The assumption in Theorem 13 seems quite reasonable. As far as I understand it, events of both 0 measure and -measure are part of a "sink of no return" (SONR) in this formalism - erasing all possible distinctions that could be made by future inferences (the only exception being that the sink can leak into the 0 sink as here). The assumption just says that the SONR of includes that of , just as in the standard version of the result, right?
It seems to me that the most direct way to adapt the (aforementioned) synthetic definition of absolute continuity so that it coincides with this condition in Theorem 13 would be to impose an equality that copying -measure events is the same as a product of -measure events. However, this doesn't seem like a good solution in general. Somewhat orthogonal approach would be to equip the objects with a structure akin to that of localizable measurable spaces I guess.
Hi @Tomáš Gonda, Yes, s-finite kernels do form an "unnormalized Markov category". I called it a commutative Freyd category in my paper.
Thanks @Tobias Fritz, I think you're right, the should be . I would also be interested to see how you derive and implies (for your definition of in s-finite kernels, or in any unnormalized Markov category?) I tried to manipulate the definition but couldn't get very far.
Possibly an easier first thing to try is to consider bounded kernels, i.e. such that [note the order of quantifiers]. I think these also form an unnormalized Markov category. This is less useful because you don't have countable coproducts nor I guess can you expect to have Radon-Nikodym because some densities are unbounded (e.g. beta(0.5,0,5)). But it avoids the infinities for a moment.
Tomáš Gonda said:
Tobias Fritz said:
is equivalent to: for all measurable and , we have that implies .
I have no idea what you are referring to here to be honest, would you mind elaborating?
Well, I didn't mean to make a definite claim there; my reasoning had been quite heuristic, and perhaps my indication of that wasn't clear enough. As I do things more carefully now, I unfortunately can no longer reproduce it. But fortunately, I now actually do recover the exact conditions of Theorem 13 of Ong and Vákár! Modulo one or two points that I'm not totally sure about, which I will mark separately as bullet points. Let me now go through the reasoning, again in the special case where and .
Assuming that this holds, it implies the obvious s-finite analogue of Lemma 4.2 of my paper. As in Example 13.3, this can then be applied to characterize almost sure equality: we have if and only if
for all measurable sets and in the respective spaces. Some fiddling with the universal quantification over , and using that and may land in the one-element space specifically, shows that the synthetic absolute continuity is equivalent to the following implication: for any two -valued functions and , if
holds for all , then this implies the same property with in place of . Now we can interpret an equation like this as an equality of measures defined by densities and with respect to . Thus by Theorem 9 of Ong and Vákár, the above equality is equivalent to the functions being -almost everywhere equal on the finite part of , and on the infinite part the set of points where one vanishes but the other one doesn't must have measure zero. Since it only matters where the functions are equal and where they vanish, it follows that it's enough to restrict to functions with values in, say, .
Indeed the first condition arises by considering and , and the second one by and .
Thanks Tobias, This is exciting.
Tobias Fritz said:
- In order to show that two s-finite measures are equal on a product of two measurable spaces , it's enough to show that they're equal on measurable rectangles . (Correct?)
Maybe I misunderstood you, but the measureable rectangles generate the product sigma algebra, so this is true for any measure?
- Some more fiddling then shows that is equivalent to the conditions given by Ong and Vákár, namely and . (Correct?)
I agree that this is the condition in Vákár-Ong, when and , but were you asking something different?
Also, do I understand you right:
and then the next step would be to check it's all still ok if we put and back into theorem, to get the general form that includes disintegration?
PS I mentioned this thread to Luke and Matthijs.
Sam Staton said:
Maybe I misunderstood you, but the measureable rectangles generate the product sigma algebra, so this is true for any measure?
Good, thanks. I wasn't sure if that applies because I had only ever worked with finite measures before.
Sam Staton said:
I agree that this is the condition in Vákár-Ong, when and , but were you asking something different?
Yes, I was asking about the final "some more fiddling" step in the proof, which I haven't done completely rigorously yet, but I know how it should go.
Sam Staton said:
Also, do I understand you right:
- a synthetic statement of (non-parameterized) Radon-Nikodym would be: if and (in this sense) then there exists that is a R-N derivative (in this sense) of with respect to ? and this holds in s-finite kernels?
and then the next step would be to check it's all still ok if we put and back into theorem, to get the general form that includes disintegration?
Yes! Let me know in case that you'd like me to work out that general case of the argument, which should be fairly straightforward. I would then also produce a more streamlined and completely rigorous version of the proof (one can simplify it a bit by proving the two directions of the equivalence separately). Of course, if you or someone else were to do this I'd be happy about that too.
Sam Staton said:
PS I mentioned this thread to Luke and Matthijs.
Great! If they're interested in seeing it or in chiming in, we can get a new invite link from a moderator.
Hello,
I have some questions regarding Radon-Nikodym derivatives in Kleisli categories. I hope this is the right place to ask. Also, please excuse inaccuracies, I am not an expert :innocent:.
Given the Giry monad on, say, measurable spaces suppose we have , , and with and so that corresponding RN derivatives exist. Moreover, assume is a monoid. With and the monoidal structure of the monad given by the product distribution we have the map
Does the structure of allow conclusions regarding ? For example, does it imply ? And if exists, can it be represented in terms of and ?
Hi Christoph! That's interesting stuff. The composite map is the convolution of probability measures on ; for , let's denote their convolution by , which is the usual notation used by analysts. Then how about a slightly simpler version of the question like this: does imply that ?
I think that the answer is no in general. For example, take considered as a monoid under addition. Like this, what convolution models is exactly the sum of independent random variables: the distribution of a sum of independent and is exactly the convolution of the distribution of with the distribution of . Now with being the uniform measure on , the convolution will not be supported on only, but on instead, meaning that .
But perhaps you have a specific situations in which additional properties hold?
Thank you for your reply, @Tobias Fritz!
with addition is what I had in mind but perhaps there is more choice involved. I am asking because I would like to understand how the effects of two independent processes on the same system are combined. However, I am not sure at all I went about it in the right way. The idea is that if two processes act independently on the same system it might look like this:
But then the two versions of the system must somehow be put together again and the addition on seemed a sensible way. What I would like to get for the Radon-Nikodym derivative, at least to first order, i.e., for the expectation value of , is
which appears like an average of the two systems(?).
Yes, using the convolution is a frequently used way to combine the effects of two independent processes on a system. It makes sense whenever the two effects are not only probabilistically independent, but in addition the effects should not interact.
On , one will typically be working with ordinary probability density functions, which are just RN derivatives with respect to the usual (Lebesgue) measure on . If we denote this Lebesgue measure by , then there's a nice formula for expressing the density of a convolution:
Just adding the RN derivatives doesn't work, because the result is not even the RN derivative of a probability measure again: if you integrate, you'll see that it has a normalization of . But if you put a factor of in front, then you get a well-defined probability measure again, which is known as the mixture.
Note that the mixture makes sense indenpendently of whether is a monoid or not, as it doesn't use the addition on . So it's quite a different kind of thing than the convolution, both mathematically and in terms of what it means.
Yes, that makes sense, thank you, @Tobias Fritz! I will try to be more specific. I'm also happy to explain why I am interested in this setup but will stick to the topic for now.
It seems to me that both involution and mixture are required though I don't know how exactly. But I can be more specific about the processes and (that I will call and from now). To understand their structure another random variable must be taken into account. Intuitively, partitions the population into collectives each with their own distribution over the range of : while the random variable denotes the property of a sample, denotes the "collective" from which the sample was drawn. Now two processes and act on the population concurrently, see setup.jpg. Moreover, while these two processes are meant to be somewhat independent there are two ways in which they may be assumed to be constrained (CA and PA), see CAPA_equations.pdf. In these diagrams, is the conditional distribution as in Fritz (2019), "A synthetic approach to Markov kernels...". I will briefly explain the intuition behind these equations. Equation (a) (= equation (c)) says that doesn't change the composition of the collectives: the mapping that gives the property distribution within a collective before occurs also gives the correct distribution afterwards . While the 'size' of a collective may change under its composition doesn't. Equation (b) says the same the other way around: changes the distribution over the property without regard to the collective. Finally, equation (d) says that doesn't affect and all changes to the distribution over occur internal to the collectives.
I am sorry if my remarks are more confusing than helpful. I have much more to say about these things and am happy to explain further.
Those are nice diagrams! You may well be the first one to use Markov category diagrams for actual mathematical modelling :smile: (Although computer scientists like @Sam Staton similarly use probabilistic programming languages for mathematical modelling, and those are often even more powerful and expressive, while the Markov categories framework is less powerful but very general.)
We've already discussed a bit how to deal with convolution by using the monoid structure , which you can use as just a box like and in the diagram. Taking a mixture works quite similarly: it's also a morphism which merges the two inputs into one output, but it does something quite different: instead of adding up the inputs, it will randomly select one of its two inputs, one with probability and the other with for some parameter . It then uses that random selection as its output.
Of course, whether either or both of these should be used will depend on your application. For example, one possibility could be to employ both by using the convolution on the two 's and the mixture on the two 's, or the other way around.
BTW the diagrams look sensible and interesting, so don't worry about them being confusing; they're very clear!
Thank you very much! I'll have to think about this.
I think I've got what I was hoping for. The attached diagram (Model.jpg) should work for any map and a convolution with respect to any monoid (perhaps abelian group) on . denotes the mixture @Tobias Fritz referred to above.
For I am thinking of the average for now. Admittedly, I am not sure what to think of the mixture. But all seems to fit nicely. To complete the puzzle I need to assign a statistical model for the Radon-Nikodym derivative of the complete map . For an individual, the regression is supposed to predict based on the property and the property of the individual's collective given by applied to the distribution over that is internal to this collective, .
The idea is that the CA equations from above correspond to the regression
while the PA equations correspond to the same regression with transformed coordinates
Does this make sense?
Now there I'm a little confused. You have , and therefore , right? So then shouldn't the domain of also be ?
Yes, something wrong there. I am not at all confident concerning my understanding of the monad action. In particular, I am not sure how the monad is visible in the string diagrams. Here, however, it seemed to me that the Kleisli morphism takes us up one step in the monad that has no obvious corresponding step on the string connecting (or ) directly to the convolution on the same side of the diagram. Since the convolution following is pointwise we need to get down again and we should be free to choose the way.
Concerning the question of how the monad is visible in the string diagrams: it is not! At least not in the "plain" string diagrams like they're usually used. Depending on what exactly you want to do, this can either be a feature or a bug. Let me elaborate a bit on this.
Obviously when you do want to reference the monad explicitly, then it's a bug. For example, in the draft paper that we're currently writing, we need to reference the monad explicitly, and we therefore need to extend the Markov categories formalism in order to facilitate this. We do so by assuming that there is a bijective correspondence between deterministic morphisms and general morphisms for all and . In particular, we obtain a map to be interpreted as sampling from a distribution (which sounds similar to your ), and a deterministic map which plays the role of assigning to every point the Dirac delta distribution at that point.
The bijective correspondence between deterministic and general then lives on top of the string diagrams and doesn't really interact with them. I'm pretty sure that there are ways to do better, in the sense that one can probably have a graphical calculus in which that correspondence is itself part of the graphical syntax in an intuitive way, for example by using things like functorial boxes. But I don't think that this has been worked out yet.
On the other hand, the fact that the string diagrams do not reference the monad can also be a feature, because there are many Markov categories which are not Kleisli categories of monads. The "plain" string diagrams can still be interpreted in these categories as well, and theorems on Markov categories are often still applicable and interesting. Hence, the inability to reference the monad is what gives us greater generality.
Sorry if this is too much information! So am I understanding correctly that your statement about taking us one step up is precisely the possibility to interpret it either as a deterministic morphism (meaning essentially measurable map) or as a generic Kleisli morphism ? If so, and if this distinction is important to you, then take a look at the string diagrams in the later sections of our draft; there, we use the bijective correspondence above by writing for the deterministic counterpart of any . In the other direction, you recover from by composing with .
But if you're not sure whether you actually need to reference the monad, then it'll be better to simply work with the Kleisli morphism picture . I think that you should stick with this picture until you arrive at a point where you have to reference explicitly, which may already have happened, and then switch to the more expressive formalism.
Thank you for all this, @Tobias Fritz! I am very happy this makes sense to you.
When I wrote the ill defined with I wasn't thinking about the monad much. But I knew that the purpose of the output of is to serve as input to compute a collective property depending on the collective's composition. In the convolution, this collective property then interacts with the individual property given by the output of . It seems to me that if connects directly to the convolution the latter computes the pairwise interaction between the individual property and the individual properties of the other individuals in the same collective. In general, however, this is not enough since the convolution is to combine the individual property with the collective property as a whole. That being said, for the case I have in mind this distinction is not necessary because seems to do the same as the 'implicit' summation(?).
With respect to your distinction between deterministic morphisms and Kleisli morphisms I am therefore quite convinced that should be considered deterministic here. To an individual it assigns the distribution over that represents the property distribution in the individual's collective. The map may become non-deterministic when individuals can belong to multiple collectives (overlapping collectives) but this should not be required for the present purposes.
I didn't have time to get an understanding of your draft (or the other paper you mentioned) yet but it makes me incredibly happy that these questions are relevant to you. I am very much looking forward to reading your draft in more detail.
PS: I'm not sure this helps but I think that the random variable is somewhat similar in spirit to the copower described in Jacobs (2017), 'Hyper Normalisation and Conditioning for Discrete Probability Distributions'. The fact that individuals are organised in collectives turns the full distribution over into a hyperdistribution of collective distributions. The normalisations discussed in this paper are surely relevant for what I am trying to do but I don't know how exactly yet.
You're welcome!
I don't know enough about your situation, and in particular about what individuals and collectives are, to follow your arguments in detail. But I get the impression that your is exactly what we call the sampling map, in which case I think that you should be able to work with after all; because composing the deterministic (in our notation) with the sampling map produces exactly its non-deterministic counterpart . Note that by virtue of being a Kleisli morphism, any morphism also assigns a distribution over to every element of , since this is what Markov kernels (Kleisli morphisms of the Giry monad) do. This is exactly the same as what the deterministic counterpart does; in other words, these two maps encode the same information, and the difference between them is merely syntactical. And composing with is exactly what takes you from the latter to the former.
Does this make sense to you? Apologies if this has already been obvious.
The connection with Jacobs's hypernormalization is also intriguing to me. I also have the impression that hypernormalization is a deep and somehow fundamental concept for probability. This raises the question of whether it can be implemented it within the Markov categories framework. I think that doing so will in particular require generalizing it to beyond the discrete case. I am now realizing that some recent additions to our draft seem to shed light on this, but it's a bit too preliminary now for me to say anything further.
Yes, that makes sense. I like the sampling map very much. Also, you're right that should be this map. It is much nicer than taking the average.
By individual and collective I mean the following: the population we're looking at is made up of units that each have a property described by . Moreover, the units are organised into collectives that partition the population. The random variable denotes the property of a randomly drawn unit along with an 'identifier' of the collective that unit is part of. The map turns the identifier into the composition of the collective it denotes, given as distribution over .
I have a manuscript I wrote last year and failed to publish so far. I gave up trying for now, and that's fine. In the manuscript I am trying to make the point I am aiming for with this discussion, but in plain language and along basic calculations. The value of the manuscript lies not in mathematical insights but, I hope, in the clarifications of certain concepts and methods in evolutionary biology. I am not sure how much sense it makes to someone not familiar with the particular research questions, nor am I sure about the correctness of the arguments. I do think, however, that the intuition behind these diagrams (Model2.jpg) is adequately described in the manuscript albeit in different terms.
I am contemplating for a while to post the manuscript on a preprint server. My supervisor and coauthor gave his permission. If you would like to read the manuscript I would be happy to post it.
Tobias Fritz said:
The bijective correspondence between deterministic and general then lives on top of the string diagrams and doesn't really interact with them. I'm pretty sure that there are ways to do better, in the sense that one can probably have a graphical calculus in which that correspondence is itself part of the graphical syntax in an intuitive way, for example by using things like functorial boxes. But I don't think that this has been worked out yet.
I've been thinking about this sort of thing recently. I don't know how helpful this is for the discussion, but here's how I'd draw what you describe above. Note that I haven't worked out anything formally, this is just pictures. (Not that being pictures makes them informal, I just haven't gone through and worked out exactly what axioms everything should obey, and it's possible something ends up breaking the whole thing.)
Following your paper with Paolo, "Bimonoidal Structures of Probability Monads" we represent the object PX as a "tube" surrounding X, like this (X on the left, PX on the right)
Then a monad consists of families of morphisms
such that
These are just string diagram representations of the usual commutative square and triangle, asserting that the various ways of going from to and from to should be equal.
Since we're in a Markov category and want to distinguish between stochastic and deterministic morphisms, I'll draw stochastic morphisms with a curved edge and deterministic ones as square, like this
we want to make an equivalence between stochastic morphisms of the form and deterministic morphisms of the form , as drawn above.
In a suitable class of Markov categories (I guess actually in any Kleisli category) we will have another family of canonical morphisms, the "sampling" operation for each , which maps points in stochastically to . Let's draw that like this:
Then we can simply write
which I find quite pleasing.
In the other direction we have
which could be taken as the definition of . In symbols this says . Although is a stochastic morphism, we can regard as a deterministic map, given by the Chapman-Kolmogorov equation.
We should also have these equations for how interacts with and :
They look like simplified versions of the monad laws, which makes some intuitive sense to me, because in the Kleisli category every object is really an object of the form in the base category. So these are actually the monad laws, just with one level of application of removed. Because of this, I'd guess that all the stuff in the bimonoidal structures paper will also work in this context, but I haven't worked through it.
I've been using notation like this informally for a while. It seems to be quite useful, because it combines the convenience of Markov categories with the ability to consider distributions explicitly when needed.
For more on the "tube diagram" notation there are a couple of blog posts by Joe Moeller, at https://joemathjoe.wordpress.com/2020/06/23/a-different-string-presentation-of-monads/ and https://joemathjoe.wordpress.com/2020/07/09/tube-diagrams-for-monoidal-monads/, as well as the paper by Tobias and Paolo, at https://arxiv.org/abs/1804.03527
Christoph Thies said:
... the intuition behind these diagrams (Model2.jpg) ...
I think I got this model all wrong. The map seems to refer to how the variables are utilised within and . Ultimately, we're interested in a model of the map . I'll have to think more about this.
Yep, functorial boxes and shadings are great! What seems to be missing so far is a complete set of rules for how they interact with the monoidal structure and Markov category structure; if such a thing was available, then we'd certainly be using it already, and probably Christoph and some others would do so as well. So if you or someone else were to propose a complete string diagram calculus, say for affine symmetric monoidal monads on cartesian monoidal categories, then that would come in very useful! One thing to keep in mind is that the string diagrams in our bimonoidal structures paper are at the level of the original category, meaning that the diagrams depict everything at the level of deterministic morphisms, while in this thread we all seem to be using string diagrams in a Kleisli/Markov category.
Yes, Tomáš has also proposed to use a separate box style for deterministic morphisms. This could also be useful, but there are a couple of caveats that make me personally uncertain about whether it should really be done:
1) What if a morphism is not known to be deterministic a priori, but later on in the course of a proof turns gets shown to be deterministic? Does it then get denoted differently, and could that be confusing?
2) What if neither morphism in a given diagram is deterministic, but a certain composite or subdiagram is?
3) On a vaguely related note, in the work that we're currently doing on the comparison of statistical experiments, it's becoming increasingly clear that properties holding merely "almost surely" is something which comes up a lot, as do almost surely deterministic morphisms.
Perhaps there's a more elaborate notation to take care of the latter two points?
@Christoph Thies, I can now follow the explanation of individuals and collectives and understand how it models population biology. I still think that achieves the same thing as . If you use the identifier of a collective as input to , then you simply get a random element of as output, and if you use the same input many times, then you get different elements sampled from the corresponding distribution. That's why and the composition are one and the same Kleisli morphism. Right? Of course this is not really specific to probability theory or Markov categories but part of the formalism of Kleisli categories in general.
I'll answer quickly because I have to go, please excuse mistakes. The reason I would like to think of the output of as element of is that a sample that is taken subsequent to has attached to it the distribution over that characterises the collective the sample is part of. The next step (the convolution) is the interaction between a collective effect computed from the attached distribution and an individual effect computed from the property of the sample itself. If and the computations and interaction within are identities and addition, resp., the difference between and might not matter.
On having a special style for deterministic morphisms, I tend to use the square box for "known to be deterministic" and the rounded edge for "possibly stochastic." If a composite turned out to be deterministic, I'd just write it as something like
I see it more as a typographical convention than a formal thing - I find it makes the diagrams easier to read in my paper notes.
I keep thinking there should be a better notational way to take care of "almost surely" in general, but I haven't hit on it yet.
On the Kleisli category versus the original category, what I was thinking this morning was that, if we want to, we can restrict the domain of P to its Kleisli category, and then we end up with a monad defined on the Kleisli category instead of the original category, and we should be able to use a similar graphical calculus for that. I speculated that a lot of the stuff from the bimonoidal structures paper will carry over to that context, but I agree that that work needs to be done.
Christoph Thies said:
I'll answer quickly because I have to go, please excuse mistakes. The reason I would like to think of the output of as element of is that a sample that is taken subsequent to has attached to it the distribution over that characterises the collective the sample is part of. The next step (the convolution) is the interaction between a collective effect computed from the attached distribution and an individual effect computed from the property of the sample itself. If and the computations and interaction within are identities and addition, resp., the difference between and might not matter.
Okay, great! If the collective effect computed from the distribution depends on the distribution in a nonlinear way, then I agree that will have to be used. Whereas if the effect depends on the distribution linearly, then it can be computed by sampling from the distribution first and then using the resulting element of as input to the effect; because then the overall effect is precisely the one given by taking the expectation over all the samples, and the Kleisli composition takes care of the formation of that expectation for you.
I imagine that there are plenty of effects in population biology which depend on the distribution in a nonlinear way. And this is the case in your situation? For example, I guess a diverse population has higher fitness than a uniform one, so that the fitness is a nonlinear function of the distribution? Is this more or less right? (Apologies if I'm using the terms incorrectly; I know that fitness usually refers to individuals, so perhaps I should be referring to something like adaptability at the population level when trying to express the advantage of diversity?)
Nathaniel Virgo said:
On having a special style for deterministic morphisms, I tend to use the square box for "known to be deterministic" and the rounded edge for "possibly stochastic." If a composite turned out to be deterministic, I'd just write it as something like
I see it more as a typographical convention than a formal thing - I find it makes the diagrams easier to read.
Cool. So then in the situation of the following statement in Infinite products and zero-one laws in categorical probability,
would you keep the phrase " is deterministic" as it is, since expressing it string-diagrammatically would not simplify anything, and use a separate notation for deterministic morphisms only when it can clearly help the reader? That sounds like something worth considering.
On the Kleisli category versus the original category, what I was thinking this morning was that, if we want to, we can restrict the domain of P to its Kleisli category, and then we end up with a monad defined on the Kleisli category instead of the original category, and we should be able to use a similar graphical calculus for that. I speculated that a lot of the stuff from the bimonoidal structures paper will carry over to that context, but I agree that that work needs to be done.
Right. One thing to be careful with is that a monad does usually not extend to a monad on its Kleisli category, as I've had to learn the hard way by being confused about it and then being corrected by my coauthors. The (only) thing that fails is the naturality of the unit! In the probability monad context, when you compose a non-deterministic Markov kernel with , then the composite returns a random delta distribution on ; but the other composite is actually deterministic, but its image is not contained in the delta distributions. The two coincide only after composing with the sampling map .
Tobias Fritz said:
Okay, great! If the collective effect computed from the distribution depends on the distribution in a nonlinear way, then I agree that will have to be used. Whereas if the effect depends on the distribution linearly, then it can be computed by sampling from the distribution first and then using the resulting element of as input to the effect; because then the overall effect is precisely the one given by taking the expectation over all the samples, and the Kleisli composition takes care of the formation of that expectation for you.
That seems correct to me.
Tobias Fritz said:
I imagine that there are plenty of effects in population biology which depend on the distribution in a nonlinear way.
Yes, higher order effects. For example, not only the units subject to causal processes evolve but also the units that constitute those processes.
Tobias Fritz said:
And this is the case in your situation?
For now I don't need this, linear is sufficient. My goal is to recreate the multilevel Price equation, an equation that formalises the biological process of selection, in category-theoretic terms. The Price equation is equivalent to a linear regression.
Tobias Fritz said:
For example, I guess a diverse population has higher fitness than a uniform one, so that the fitness is a nonlinear function of the distribution?
Yes, that would be an example where knowledge of the average is insufficient to determine fitness.
Tobias Fritz said:
Is this more or less right?
Yes, perfect!
Tobias Fritz said:
Apologies if I'm using the terms incorrectly
That's fine. Also, many terms are not clearly defined.
Tobias Fritz said:
I know that fitness usually refers to individuals, so perhaps I should be referring to something like adaptability at the population level when trying to express the advantage of diversity?
That's a far-reaching question. What replication could mean on higher levels and how it could be formalised is largely unclear. Let's think about this once we're done with selection!
I have a new version of the model. It's surely not without mistakes but it looks like a big step to me. It seems to do what I hoped for and more. The diagram shows the PA version of the equations above. In the diagram, and denote collective and individual fitness, resp.; .
In the CA version, the right leg just applies and discards . The left and right leg of the diagram represent and , resp. ( is discarded in both legs). The intuition behind is the following: In , the collective phenotype (output of ) is evaluated in . (sorry for the notation) converts the outcome to the corresponding distribution over . In , the individual phenotype interacts with the collective phenotype at the convolution. The output is evaluated in to give the (relative?) distribution over . The mixture combines the two copies of the system.
I'm sure something is wrong around . I think it's to do with normalisation and the fact that the collective distributions are not full distributions (not summing up to one).
I have to admit I am somewhat overwhelmed by how much sense this makes, @Tobias Fritz. Everything fits together. I feel compelled to post my manuscript on BioRxiv now, as a draft. Do you think this might be a bad idea? I'd post it as I wrote it last year, without category theory.
Christoph, I'm sorry if I personally am not competent to comment on a manuscript outside of my areas of expertise. Perhaps you can ask another mathematical biologist who has studied the Price equation for feedback? For example, Matteo Smerlak and his coauthors have worked on mathematically sophisticated approaches to evolution involving probabilistic dynamics, for example in Limiting fitness distributions in evolutionary dynamics. Perhaps they would be able to comment? In any case, please let us know if/when you post it, as I'd be curious to take a look and learn a bit more about it, even if I won't be able to assess its merits.
I'll just post it here then, for now: CAPA.pdf
That looks like a really nice paper! I don't think that I'll be able to read it in detail, but the parts that I've read (in particular the introduction) are quite interesting and made good sense. So I very much hope that this will be of interest to mathematical biologists as well!
Thank you, @Tobias Fritz !
I would like to fix the left leg of the diagram above. With the incorrect composition ( with and ) I am trying to say that collective selection acts on but is determined by the output of . How can I express this?
Now I'm admittedly getting more confused. I thought that your was a morphism , namely the sampling map? But now the input of is . I also don't know what and are.
I am sorry for the confusion, @Tobias Fritz. I didn't write this down correctly. Let's see the maps involved:
I was thinking of and as somehow representing the part that is left to explain in the complete process. Everything else seems specified. We have and I thought also . Now it seems needs another input to determine the mapping that acts on , like this Model4.png.
But then it would seem that the same is required for : one input determines the function that acts on the other. And both inputs are identical belong to the same individual. That's nice. Does it make sense?
Like this: Model5.png
Where is my regression? :rolling_eyes:
Well, as I've pointed out a number of times before, we have , so it seems to me that this coincides with what you now denote .
So your and are the same components of the model as the morphisms that you had previously denoted and ?
Tobias Fritz said:
Well, as I've pointed out a number of times before, we have , so it seems to me that this coincides with what you now denote .
Yes. I think that's ok. In the additive case any random element of the associated collective will probably do. I was getting ahead of myself talking about functions of distributions.
So your and are the same components of the model as the morphisms that you had previously denoted and ?
No, it's like this, I think: Model5.1.png
Okay! Then I'm not sure why to use two different symbol to denote the same morphism, but otherwise it makes sense to me :smile:
Nice!
Tobias Fritz said:
I'm not sure why to use two different symbol to denote the same morphism
Which symbols are you referring to?
but otherwise it makes sense to me :smile:
That makes me very happy!
Great! I thought that we had agreed that and denote the same morphism because they're both equal to . That's what I've been referring to.
I see. Yes. Here's both versions: Model5-PA.png, Model5-CA.png.
I think my regression is not far away. Consider in CA. Suppose acts on the left input with the right input controlling the mapping. For a sample we therefore get a map . Since our individuals breed true (no mutation, i.e., offspring cannot differ from their parents in phenotype ) and we have no migration (no influx of -values not previously present in the population) with the projection we can assume
Therefore we have a Radon-Nikodym derivative . For we have , ready for regression!
Sorry, @Tobias Fritz, I messed it up completely! The RN derivative is given by
Does this make sense?
How convenient that the RN derivative automatically yields exponential behaviour of the frequencies in when iterated.
Christoph, I'm afraid that I'll have to take a break from the discussion (for now) - I'm moving to Austria! And organizing things is now starting to keep me quite busy.
Yes, sure, Tobias. Those last days chatting with you were quite exciting for me. I apologise if I was not considerate towards your time. All the best for your move! Austria is very nice.
If I may ask, do you think you'll be around again anytime soon? I need to finish my PhD thesis before long and reporting the things we discussed here would be very useful for me. It seems to me I'm not far away but I need your help :see_no_evil:
In case you are interested, here are all three videos by Prakash: https://www.youtube.com/playlist?list=PLaILTSnVfqtI6MDWQUqB2mIhx1USzXkj4
Tomáš Gonda said:
Does anyone know of a theorem in categorical probability that could be regarded as a categorical version of the Radon-Nikodym Theorem? I have been wondering about this a couple of times, but a short literature search never provided a result I'd be happy with.
I don't know if has already been mentioned in this thread ( I didn't read all of it) but Bunge& Funk describe a Topos-Theoretic Radon-Nikodym theorem in their book Singular Coverings of Toposes.
Hi @Alexander Gietelink Oldenziel, could you say which theorem in Bunge&Funk's book you mean?
Peter Arndt said:
Hi Alexander Gietelink Oldenziel, could you say which theorem in Bunge&Funk's book you mean?
Hi Peter! I was thinking of section 6.2, about inverting distributions.
There is another paper where Marta Bunge explicitly says it is an analog of Radon-Nikodym.
I spent some time thinking about analogies of conditional probability and sigma algebras in this context. We can talk a little about it if you want, though I didn't get very far.
Ah, wow, looks like quite a journey from classical Radon-Nikodym to that chapter!
Yes, I would love to talk about that, just need to find some time...
Christoph Thies said:
Here's both versions: Model5-PA.png, Model5-CA.png.
Hello,
I have been thinking more about the equations I tried to build before. Using diagrams like those by @Nathaniel Virgo above, the collective is now represented explicitly in terms of the monad.
In an experimental setting, individuals are organised into collectives that in turn make up the population. The equations describe an episode of selection that acts on both the individual and the collective phenotypes.
The element of on the left comes about as follows, I think. A collection of collectives of individuals is given as a collection of distributions over , the space of individual phenotypes, that is an element of a coproduct . The monad unit induces a map . This situates the multilevel Price equation as in Gardner, A., The genetical theory of multilevel selection , Journal of Evolutionary Biology, 2015, 28, 305-319, Equation (5) (the author considers the genetic value as phenotype), in the context of the diagram below.
In the lower branch in the monad diagrams above, collective composition, i.e., the inner distributions, should remain unchanged. The lower branch therefore has a side branch that keeps the inner tube so that it can be restored after . This looks a little awkward. Is there a more elegant way to represent this invariance? Could the inner distributions tunnel through the box ? :caterpillar:
I made some progress on this and wonder if someone is interested or would have a look to point out mistakes.
Consider the probability monad and . Then I'd like to write the two models sketched above as follows.
CAPAInMonads.png
Moreover, the map (and, similarly, ) satisfies the diagrams below.
MonadHomomorphism.png
The latter diagrams seem similar to those in the definition of morphisms of monads in nlab (https://ncatlab.org/nlab/show/monad, Section "The bicategory of monads") but I can't follow the description there. Is it correct to say that is a morphism of monads with 1-cell (and a 2-cell that I cannot write but that seems to be the identity as well)?
What would help you follow the definition in the nLab?
A morphism of monads is first of all a natural transformation. Do you have such a map for all objects , or just for one?
John Baez said:
What would help you follow the definition in the nLab?
I suppose I'd have to learn what exactly bicategories are. I dodged this so far as I am afraid they'll drag me in further. It seems to always make sense to think beyond.
Okay, you don't need to know what a bicategory is. If you're trying to understand a morphism of monads, that doesn't matter much.
That's what I was hoping! Could you point me to a reference that describes morphisms of monads without bicategories?
No. I'm sure one exists; I just don't know it. I would just look at the nLab page's definition of "morphism of monads", which does not require that you know about bicategories.
Take that definition, and where they say "1-cell" read "functor". Where they say "2-cell" read "natural transformation"
Where they say "monad in K" read "monad in Cat", i.e. plain old monad.
I'll try that. Thank you!
I typed "monad morphism" into Google and instantly got this:
https://mathoverflow.net/questions/92093/functors-between-monads-what-are-these-really-called
This is a guy who defines morphisms of monads without knowing what they're called.
His "natural map" must be a natural transformation.
With luck this definition will exactly match the nLab definition if you translate between the terminologies. With luck the key equations will agree. If you can get them to match up, you've probably got the right idea.
Paolo Perrone said:
A morphism of monads is first of all a natural transformation. Do you have such a map for all objects , or just for one?
I have a map for one , but the construction works for any .
To explain why these diagrams are relevant I'll describe that I'll call from now (forget about as well). It is given by scaling the distribution pointwise and then normalising. With and normalisation ( is the unnormalised monad and is inclusion), is given by
This construction makes satisfy the diagram below because normalisation reverses the scaling.
Unit.png
also satisfies the second diagram. To see this I did calculations similar to those you demonstrated in your recent talk on partial evaluations (https://www.youtube.com/watch?v=ynxfrlqr4I0).
Multiplication.png
How exactly do you scale the distribution pointwise? Could you give an example?
Paolo Perrone said:
How exactly do you scale the distribution pointwise? Could you give an example?
For with
But then comes normalisation and, as I see now, the last diagram is not generally satisfied. That's better, because the following two diagrams seem to say the same.
Multiplication.png
MultiplicationString.png
I would now draw the two versions of the process as follows.
CAPAInMonads2.png
In the right hand version (PA), it is necessary that does not slide out of the tube! In fact, the whole point of the distinction is that in CA, is applied across the metapopulation, and in PA, is applied within the populations.
I'm quite convinced about the diagram for the unit, though.
Unit.png
UnitString.png
It says (I think) that there is no mutation or other funny stuff happening in that creates novel things, i.e., that increases the support of the distribution.
What I still would like to say but don't know how is that leaves the inner expression unchanged.
I got myself into a bit of a pickle with the names that I'd like to sort out. Below the overview in which the processes on the right refine the process on the left.
CAPAOverview-1.png
More specifically, there are maps and such that is given by
and is given by
Moreover, and satisfy the diagrams below.
Unit1.png
Unit2.png
The latter equality seems to say that leaves the inner distributions unchanged.