Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: event: Categorical Probability and Statistics 2020 workshop

Topic: Categorical Radon-Nikodym

Tomáš Gonda (Jun 05 2020 at 22:40):

Does anyone know of a theorem in categorical probability that could be regarded as a categorical version of the Radon-Nikodym Theorem? I have been wondering about this a couple of times, but a short literature search never provided a result I'd be happy with.

Paolo Perrone (Jun 05 2020 at 22:53):

Prakash Panangaden gave a very nice talk at UCR where he used the theorem almost synthetically: https://categorytheory.zulipchat.com/#narrow/stream/229966-ACT.40UCR-seminar/topic/April.208th.3A.20Prakash.20Panangaden

Tobias Fritz (Jun 13 2020 at 19:56):

This question has been on my mind over the past week, and I'd now like to give a partial answer from the Markov categories perspective. Naively, one might think that Markov categories are not expressive enough to talk about densities and the Radon-Nikodym theorem. I think that this is largely true, but one can sidestep the appeal to densities and the Radon-Nikodym theorem to some extent. (And there may also be a possibility to build in densities natively into the framework, but I wouldn't yet know how to do that.)

But first, why are densities important? I can see two main reasons besides the Radon-Nikodym theorem:

They act on measures by multiplication. Of course this is often how they come up in practice, as many of the commonly considered probability measures are most conveniently defined in terms of a density, such as the Gaussian.
Densities are contravariant: while a probability measure can be pushed forward along a measurable map, a density can be pulled back simply by function composition. (Modulo some subtleties on whether a density is a real-valued function or the equivalence class of a function under almost sure equality, but let me ignore this for now.)

The opposite variance of densities is related to the fact that Bayesian inversion can be described as a dagger functor, as explained in Remark 13.10 of my Markov cats paper. So while I don't know how to formulate (let alone prove) a general Radon-Nikodym theorem for Markov categories, there is a more particular construction which works in any Markov category with conditionals. Namely if $f : X \to Y$ is any measurable map, $\mu$ a probability measure on $X$ and $\nu$ a probability measure on $Y$ with $\nu \ll f_\ast \mu$ , then we can form a new measure on $X$ given by $f^\ast(\frac{d\nu}{df_\ast\mu})\cdot\mu$ . Here, $f^\ast$ denotes the pullback of functions by composition as above.

How can we construct this new measure using only the Markov categories structure? This is possible because that measure turns out to be given by the composition $f^\dag_\mu \nu$ , where $f^\dag_\mu : Y \to X$ is a Bayesian inverse of $f$ with respect to $\mu$ ; showing this just requires some calculation, but I think is closely related to the construction of conditional expectations in terms of the Radon-Nikodym theorem. This measure $f^\dag_\mu \nu$ can be shown to be well-defined, i.e. independent of the particular choice of Bayesian inverse, as soon as $\nu \ll f_\ast \mu$ holds; in a Markov cat, this means by definition that $f_\ast\mu$ -a.s. equality of morphisms out of $Y$ must imply $\nu$ -a.s. equality. So in particular types of situations, we can get around the Radon-Nikodym theorem in a way which makes sense in any Markov category with conditionals.

This construction is a special case of letting a Radon-Nikodym derivative act on a measure by multiplication, and may thus seem a bit removed from the Radon-Nikodym theorem itself. This is true, but there still seem to be important applications of this special case. In particular, the abstract Fisher-Neyman factorization theorem (Theorem 14.5 of my paper) uses this type of construction (although this is not explained in the paper because I didn't know this at the time of writing.)

I'm not sure to what extent other applications of the Radon-Nikodym theorem can be sidestepped like this. It's hard to imagine that all of them can be. For example, one may ask whether for given measures $\mu,\nu,\rho$ on the same measurable space with $\mu,\nu \ll \rho$ the new measure $\frac{d\mu}{d\rho} \cdot \nu$ would be similarly constructible only from the Markov category structure and conditionals only. I have a sketch of a proof that this is not possible in a generic Markov category with conditionals.

Sam Staton (Jun 14 2020 at 06:23):

Nice. Can you say a bit more about how you interpret $\nu\ll\mu$ in a Markov category? Sorry I may have missed it in your article.

In the category of s-finite kernels, the morphisms $X\to 1$ amount to measurable functions $X\to[0,\infty]$ . So I think one can easily talk about a Radon-Nikodym derivative of a measure $1\xrightarrow{\mu} X$ with respect to a measure $1\xrightarrow{\nu} X$ as a morphism $f:X\to 1$ such that $\mu = 1\xrightarrow{\nu} X\xrightarrow{f,id} 1 \otimes X\cong X$ . (This looks especially easy in your diagrammatic notation.) But I haven't yet tried to phrase/prove the RN theorem in this abstract categorical setting of synthetic measure theory.

Tobias Fritz (Jun 14 2020 at 12:42):

Interesting! That sounds like a good reason to consider categories like Markov categories, but where instead the terminality of $I$ is dropped (and replaced by the mere existence of a non-natural unit effect; I think @Arthur Parzygnat has been working with this definition).

Either way, $\nu \ll \mu$ for $\mu,\nu : I \to X$ can be defined to mean the following: if $f =_{\mu-as} g$ for any two parallel morphisms $f$ and $g$ out of $X$ , then also $f =_{\nu-as} g$ . To see that this is equivalent to the standard definition in the Kleisli category of the Giry monad (on all measurable spaces), just take $f = 0$ and $g$ the indicator function of a possible null set; the other direction seems to follow most easily by using the RN theorem. In fact, this synthetic definition makes sense for any two morphisms $\mu : A \to X$ and $\nu : B \to X$ with the same codomain $X$ , and I believe that the semantics of the condition in the Kleisli category of the Giry monad is then similar, amounting to $\mu(S|a) = 0 \:\forall a \: \Rightarrow \nu(S|b) = 0 \:\forall b$ . Perhaps this still holds in the category of s-finite kernels? (BTW, this definition was not in my paper yet, so you can't possibly have missed it.)

Now following your idea of how densities act on measures, we can easily express the RN theorem synthetically, although I'm "almost sure" that it can't be proven to hold from just the categorical structure and the existence of conditionals. However, when expressed diagrammatically like this, the RN theorem acquires the flavour of a factorization property vaguely reminiscent of the existence of conditionals. So I wonder:

Puzzle: Do the existence of conditionals and the RN theorem have a common generalization?

If so, their common generalization may be a very natural candidate axiom for synthetic measure/probability theory. Or am I completely off track here?

Sam Staton (Jun 15 2020 at 20:11):

Thanks @Tobias Fritz. Matthijs Vákár and Luke Ong worked out the theory of R-N derivatives and conditional probabilities for s-finite kernels here.

They give a general theorem for conditional probability that includes R-N derivatives as a special case (Theorem 13 and Remark 2). It would be great to write this in abstract categorical language. Is that the kind of thing you were thinking of in your puzzle?

But they use special forms of almost-surely and absolute continuities to deal with the possible infinities. Maybe this is your categorical formulation immediately translated to the category of s-finite kernels, but I am not yet sure.

Tobias Fritz (Jun 19 2020 at 13:00):

@Sam Staton Wow! Yes, that indeed looks like a very nice answer to my puzzle.

Just to make sure I understand: in the extra condition just after the footnote 7 of Ong and Vákár, I assume that $x \in X$ should be $z \in Z$ , right? I've been trying to see how this condition could possibly be implied by my synthetic definition of absolute continuity proposed above, when applied to the category of s-finite kernels. What I seem to be getting (with $\phi = \mathrm{id}$ and $Z = I$ for simplicity) is that $\mu \ll \nu$ is equivalent to: for all measurable $S$ and $T$ , we have that $\nu(S) = \nu(T)$ implies $\mu(S) = \mu(T)$ . Hence I do not know how to state the generalized disintegration theorem Of Ong and Vákár synthetically. Perhaps it's possible upon tweaking the definitions on either side a bit more.

Tomáš Gonda (Jun 20 2020 at 05:26):

That Vákár and Ong paper looks very nice indeed!

Does anyone know for a fact whether the category of s-finite kernels is (an unnormalized) Markov category in the sense of Tobias' paper?

Tobias Fritz said:

$\mu \ll \nu$ is equivalent to: for all measurable $S$ and $T$ , we have that $\nu(S) = \nu(T)$ implies $\mu(S) = \mu(T)$ .

I have no idea what you are referring to here to be honest, would you mind elaborating?

The assumption in Theorem 13 seems quite reasonable. As far as I understand it, events of both 0 measure and $\infty$ -measure are part of a "sink of no return" (SONR) in this formalism - erasing all possible distinctions that could be made by future inferences (the only exception being that the $\infty$ sink can leak into the 0 sink as $0 \cdot \infty = 0$ here). The assumption just says that the SONR of $\nu$ includes that of $\mu$ , just as in the standard version of the result, right?

It seems to me that the most direct way to adapt the (aforementioned) synthetic definition of absolute continuity so that it coincides with this condition in Theorem 13 would be to impose an equality that copying $\infty$ -measure events is the same as a product of $\infty$ -measure events. However, this doesn't seem like a good solution in general. Somewhat orthogonal approach would be to equip the objects with a structure akin to that of localizable measurable spaces I guess.

Sam Staton (Jun 20 2020 at 13:34):

Hi @Tomáš Gonda, Yes, s-finite kernels do form an "unnormalized Markov category". I called it a commutative Freyd category in my paper.

Thanks @Tobias Fritz, I think you're right, the $x\in X$ should be $z\in Z$ . I would also be interested to see how you derive $\mu\ll\nu$ and $\nu(S)=\nu(T)$ implies $\mu(S)=\mu(T)$ (for your definition of $\ll$ in s-finite kernels, or in any unnormalized Markov category?) I tried to manipulate the definition but couldn't get very far.

Possibly an easier first thing to try is to consider bounded kernels, i.e. $k:X\times \Sigma_Y\to [0,\infty)$ such that $\exists n.\,\forall x.\,k(x,Y)<n$ [note the order of quantifiers]. I think these also form an unnormalized Markov category. This is less useful because you don't have countable coproducts nor I guess can you expect to have Radon-Nikodym because some densities are unbounded (e.g. beta(0.5,0,5)). But it avoids the infinities for a moment.

Tobias Fritz (Jun 20 2020 at 16:38):

Tomáš Gonda said:

Tobias Fritz said:

$\mu \ll \nu$ is equivalent to: for all measurable $S$ and $T$ , we have that $\nu(S) = \nu(T)$ implies $\mu(S) = \mu(T)$ .

I have no idea what you are referring to here to be honest, would you mind elaborating?

Well, I didn't mean to make a definite claim there; my reasoning had been quite heuristic, and perhaps my indication of that wasn't clear enough. As I do things more carefully now, I unfortunately can no longer reproduce it. But fortunately, I now actually do recover the exact conditions of Theorem 13 of Ong and Vákár! Modulo one or two points that I'm not totally sure about, which I will mark separately as bullet points. Let me now go through the reasoning, again in the special case where $Z = I$ and $\phi = \mathrm{id}$ .

In order to show that two s-finite measures are equal on a product of two measurable spaces $X \times Y$ , it's enough to show that they're equal on measurable rectangles $S \times T$ . (Correct?)

Assuming that this holds, it implies the obvious s-finite analogue of Lemma 4.2 of my paper. As in Example 13.3, this can then be applied to characterize almost sure equality: we have $f =_{\nu} g$ if and only if

$\int_U f(V|x) \, \nu(dx) = \int_U g(V|x) \, \nu(dx)$

for all measurable sets $U$ and $V$ in the respective spaces. Some fiddling with the universal quantification over $V$ , and using that $f$ and $g$ may land in the one-element space $I$ specifically, shows that the synthetic absolute continuity $\mu \ll \nu$ is equivalent to the following implication: for any two $[0,\infty]$ -valued functions $f$ and $g$ , if

$\int_U f(x) \, \nu(dx) = \int_U g(x) \, \nu(dx)$

holds for all $U$ , then this implies the same property with $\mu$ in place of $\nu$ . Now we can interpret an equation like this as an equality of measures defined by densities $f$ and $g$ with respect to $\nu$ . Thus by Theorem 9 of Ong and Vákár, the above equality is equivalent to the functions being $\nu$ -almost everywhere equal on the finite part of $\nu$ , and on the infinite part the set of points where one vanishes but the other one doesn't must have measure zero. Since it only matters where the functions are equal and where they vanish, it follows that it's enough to restrict to functions with values in, say, $\{0,1,2\}$ .

Some more fiddling then shows that $\mu \ll \nu$ is equivalent to the conditions given by Ong and Vákár, namely $\nu(S) = 0 \:\Rightarrow\: \mu(S) = 0$ and $\mu(\infty[\nu]\setminus\infty[\mu]) = 0$ . (Correct?)

Indeed the first condition arises by considering $f := 1_S$ and $g := 0$ , and the second one by $f := 1_{\infty[\nu]\setminus\infty[\mu]}$ and $g := 2f$ .

Sam Staton (Jun 22 2020 at 07:03):

Thanks Tobias, This is exciting.

Tobias Fritz said:

In order to show that two s-finite measures are equal on a product of two measurable spaces $X \times Y$ , it's enough to show that they're equal on measurable rectangles $S \times T$ . (Correct?)

Maybe I misunderstood you, but the measureable rectangles generate the product sigma algebra, so this is true for any measure?

Some more fiddling then shows that $\mu \ll \nu$ is equivalent to the conditions given by Ong and Vákár, namely $\nu(S) = 0 \:\Rightarrow\: \mu(S) = 0$ and $\mu(\infty[\nu]\setminus\infty[\mu]) = 0$ . (Correct?)

I agree that this is the condition in Vákár-Ong, when $Z=I$ and $\phi=\mathrm{id}$ , but were you asking something different?

Also, do I understand you right:

a synthetic statement of (non-parameterized) Radon-Nikodym would be: if $\mu,\nu:I\to X$ and $\mu\ll\nu$ (in this sense) then there exists $f:X\to I$ that is a R-N derivative (in this sense) of $\mu$ with respect to $\nu$ ? and this holds in s-finite kernels?

and then the next step would be to check it's all still ok if we put $Z$ and $\phi$ back into theorem, to get the general form that includes disintegration?

PS I mentioned this thread to Luke and Matthijs.

Tobias Fritz (Jun 22 2020 at 09:21):

Sam Staton said:

Maybe I misunderstood you, but the measureable rectangles generate the product sigma algebra, so this is true for any measure?

Good, thanks. I wasn't sure if that applies because I had only ever worked with finite measures before.

Sam Staton said:

I agree that this is the condition in Vákár-Ong, when $Z=I$ and $\phi=\mathrm{id}$ , but were you asking something different?

Yes, I was asking about the final "some more fiddling" step in the proof, which I haven't done completely rigorously yet, but I know how it should go.

Sam Staton said:

Also, do I understand you right:

a synthetic statement of (non-parameterized) Radon-Nikodym would be: if $\mu,\nu:I\to X$ and $\mu\ll\nu$ (in this sense) then there exists $f:X\to I$ that is a R-N derivative (in this sense) of $\mu$ with respect to $\nu$ ? and this holds in s-finite kernels?

and then the next step would be to check it's all still ok if we put $Z$ and $\phi$ back into theorem, to get the general form that includes disintegration?

Yes! Let me know in case that you'd like me to work out that general case of the argument, which should be fairly straightforward. I would then also produce a more streamlined and completely rigorous version of the proof (one can simplify it a bit by proving the two directions of the equivalence separately). Of course, if you or someone else were to do this I'd be happy about that too.

Sam Staton said:

PS I mentioned this thread to Luke and Matthijs.

Great! If they're interested in seeing it or in chiming in, we can get a new invite link from a moderator.

Christoph Thies (Jul 30 2020 at 08:56):

Hello,

I have some questions regarding Radon-Nikodym derivatives in Kleisli categories. I hope this is the right place to ask. Also, please excuse inaccuracies, I am not an expert :innocent:.

Given the Giry monad $\mathcal{P}$ on, say, measurable spaces $\mathcal{C}$ suppose we have $Z\in\mathcal{C}$ , $\omega\in\mathcal{P}Z$ , and $f,g:\mathcal{P}Z\rightarrow \mathcal{P}Z$ with $f\omega\ll\omega$ and $g\omega\ll\omega$ so that corresponding RN derivatives exist. Moreover, assume $(Z,+_Z)$ is a monoid. With $\text{copy}_Z : Z\rightarrow Z\otimes Z$ and the monoidal structure of the monad $\nabla :\mathcal{P}Z\otimes\mathcal{P}Z\rightarrow\mathcal{P}(Z\otimes Z)$ given by the product distribution we have the map

$T:\mathcal{P}Z\rightarrow\mathcal{P}Z\otimes\mathcal{P}Z\xrightarrow{f\otimes g}\mathcal{P}Z\otimes\mathcal{P}Z\xrightarrow{\nabla}\mathcal{P}(Z\otimes Z)\xrightarrow{\mathcal{P}+_Z}\mathcal{P}Z.$

Does the structure of $T$ allow conclusions regarding $T\omega$ ? For example, does it imply $T\omega\ll\omega$ ? And if $\frac{\text{d}T\omega}{\text{d}\omega}$ exists, can it be represented in terms of $\frac{\text{d}f\omega}{\text{d}\omega}$ and $\frac{\text{d}g\omega}{\text{d}\omega}$ ?

Tobias Fritz (Jul 30 2020 at 16:41):

Hi Christoph! That's interesting stuff. The composite map $PZ \otimes PZ \stackrel{\nabla}{\longrightarrow} P(Z \otimes Z) \longrightarrow P(Z)$ is the convolution of probability measures on $Z$ ; for $\mu,\nu \in PZ$ , let's denote their convolution by $\mu\ast\nu$ , which is the usual notation used by analysts. Then how about a slightly simpler version of the question like this: does $\mu,\nu \ll \omega$ imply that $\mu \ast \nu \ll \omega$ ?

I think that the answer is no in general. For example, take $Z = \mathbb{R}$ considered as a monoid under addition. Like this, what convolution models is exactly the sum of independent random variables: the distribution of a sum $X + Y$ of independent $X$ and $Y$ is exactly the convolution of the distribution of $X$ with the distribution of $Y$ . Now with $\mu = \nu = \omega$ being the uniform measure on $[-1,1]$ , the convolution $\mu \ast \nu$ will not be supported on $[-1,1]$ only, but on $[-2,2]$ instead, meaning that $\mu \ast \nu \not\ll \omega$ .

But perhaps you have a specific situations in which additional properties hold?

Christoph Thies (Jul 30 2020 at 18:59):

Thank you for your reply, @Tobias Fritz!

$Z=\R$ with addition is what I had in mind but perhaps there is more choice involved. I am asking because I would like to understand how the effects of two independent processes on the same system are combined. However, I am not sure at all I went about it in the right way. The idea is that if two processes act independently on the same system it might look like this:

$\mathcal{P}Z\rightarrow\mathcal{P}Z\otimes\mathcal{P}Z\xrightarrow{f\otimes g}\mathcal{P}Z\otimes\mathcal{P}Z.$

But then the two versions of the system must somehow be put together again and the addition on $Z$ seemed a sensible way. What I would like to get for the Radon-Nikodym derivative, at least to first order, i.e., for the expectation value of $Z$ , is

$\frac{\text{d}T\omega}{\text{d}\omega} = \frac{\text{d}f\omega}{\text{d}\omega} + \frac{\text{d}g\omega}{\text{d}\omega},$

which appears like an average of the two systems(?).

Tobias Fritz (Jul 30 2020 at 20:41):

Yes, using the convolution is a frequently used way to combine the effects of two independent processes on a system. It makes sense whenever the two effects are not only probabilistically independent, but in addition the effects should not interact.

On $\mathbb{R}$ , one will typically be working with ordinary probability density functions, which are just RN derivatives with respect to the usual (Lebesgue) measure on $\mathbb{R}$ . If we denote this Lebesgue measure by $\omega$ , then there's a nice formula for expressing the density of a convolution:

$\frac{d(\mu\ast\nu)}{d\omega}(x) = \int \frac{d\mu}{d\omega}(y) \frac{d\nu}{d\omega}(x-y) \, d\omega(y).$

Just adding the RN derivatives doesn't work, because the result is not even the RN derivative of a probability measure again: if you integrate, you'll see that it has a normalization of $1 + 1 = 2$ . But if you put a factor of $1/2$ in front, then you get a well-defined probability measure again, which is known as the mixture.

Note that the mixture makes sense indenpendently of whether $Z$ is a monoid or not, as it doesn't use the addition on $Z$ . So it's quite a different kind of thing than the convolution, both mathematically and in terms of what it means.

Christoph Thies (Jul 31 2020 at 11:36):

Yes, that makes sense, thank you, @Tobias Fritz! I will try to be more specific. I'm also happy to explain why I am interested in this setup but will stick to the topic for now.

It seems to me that both involution and mixture are required though I don't know how exactly. But I can be more specific about the processes $f$ and $g$ (that I will call $s$ and $t$ from now). To understand their structure another random variable $C$ must be taken into account. Intuitively, $C$ partitions the population into collectives each with their own distribution over the range of $Z$ : while the random variable $Z$ denotes the property of a sample, $C$ denotes the "collective" from which the sample was drawn. Now two processes $s$ and $t$ act on the population concurrently, see setup.jpg. Moreover, while these two processes are meant to be somewhat independent there are two ways in which they may be assumed to be constrained (CA and PA), see CAPA_equations.pdf. In these diagrams, $\omega|_C:C\rightarrow Z$ is the conditional distribution as in Fritz (2019), "A synthetic approach to Markov kernels...". I will briefly explain the intuition behind these equations. Equation (a) (= equation (c)) says that $s$ doesn't change the composition of the collectives: the mapping $\omega|_C$ that gives the property distribution within a collective before $s$ occurs also gives the correct distribution afterwards $(\omega|_C = (s\omega)|_C)$ . While the 'size' of a collective may change under $s$ its composition doesn't. Equation (b) says the same the other way around: $t$ changes the distribution over the property $Z$ without regard to the collective. Finally, equation (d) says that $t$ doesn't affect $C$ and all changes to the distribution over $Z$ occur internal to the collectives.

I am sorry if my remarks are more confusing than helpful. I have much more to say about these things and am happy to explain further.

Tobias Fritz (Jul 31 2020 at 12:22):

Those are nice diagrams! You may well be the first one to use Markov category diagrams for actual mathematical modelling :smile: (Although computer scientists like @Sam Staton similarly use probabilistic programming languages for mathematical modelling, and those are often even more powerful and expressive, while the Markov categories framework is less powerful but very general.)

We've already discussed a bit how to deal with convolution by using the monoid structure $Z \otimes Z \to Z$ , which you can use as just a box like $s$ and $t$ in the diagram. Taking a mixture works quite similarly: it's also a morphism $Z \otimes Z \to Z$ which merges the two inputs into one output, but it does something quite different: instead of adding up the inputs, it will randomly select one of its two inputs, one with probability $\lambda$ and the other with $1-\lambda$ for some parameter $\lambda$ . It then uses that random selection as its output.

Of course, whether either or both of these should be used will depend on your application. For example, one possibility could be to employ both by using the convolution on the two $Z$ 's and the mixture on the two $C$ 's, or the other way around.

Tobias Fritz (Jul 31 2020 at 12:40):

BTW the diagrams look sensible and interesting, so don't worry about them being confusing; they're very clear!

Christoph Thies (Jul 31 2020 at 13:00):

Thank you very much! I'll have to think about this.

Christoph Thies (Aug 01 2020 at 15:17):

I think I've got what I was hoping for. The attached diagram (Model.jpg) should work for any map $\text{E}:\mathcal{P}Z\rightarrow Z$ and a convolution $*_{+_{Z}}:\mathcal{P}Z\otimes\mathcal{P}Z\rightarrow\mathcal{P}Z$ with respect to any monoid (perhaps abelian group) on $Z$ . $M$ denotes the mixture @Tobias Fritz referred to above.

For $\text{E}$ I am thinking of the average for now. Admittedly, I am not sure what to think of the mixture. But all seems to fit nicely. To complete the puzzle I need to assign a statistical model for the Radon-Nikodym derivative $w=\frac{\text{d}T\omega}{\text{d}\omega}:Z\rightarrow\R$ of the complete map $T:\mathcal{P}Z\rightarrow\mathcal{P}Z$ . For an individual, the regression is supposed to predict $w$ based on the property $z$ and the property of the individual's collective given by $\text{E}$ applied to the distribution over $Z$ that is internal to this collective, $\xi = \text{E}\omega|_C\omega|_Z:Z\rightarrow Z$ .

The idea is that the CA equations from above correspond to the regression

$w(z)=c_1z+c_2\xi(z)$

while the PA equations correspond to the same regression with transformed coordinates

$w(z)=c'_1(z-\xi(z))+c'_2\xi(z).$

Does this make sense?

Tobias Fritz (Aug 01 2020 at 16:25):

Now there I'm a little confused. You have $\omega : I \to Z \otimes C$ , and therefore $\omega_C : C \to Z$ , right? So then shouldn't the domain of $E$ also be $Z$ ?

Christoph Thies (Aug 01 2020 at 20:22):

Yes, something wrong there. I am not at all confident concerning my understanding of the monad action. In particular, I am not sure how the monad is visible in the string diagrams. Here, however, it seemed to me that the Kleisli morphism $\omega|_C$ takes us up one step in the monad that has no obvious corresponding step on the string connecting $s$ (or $t$ ) directly to the convolution on the same side of the diagram. Since the convolution following $\omega|_C$ is pointwise we need to get down again and we should be free to choose the way.

Tobias Fritz (Aug 02 2020 at 13:51):

Concerning the question of how the monad is visible in the string diagrams: it is not! At least not in the "plain" string diagrams like they're usually used. Depending on what exactly you want to do, this can either be a feature or a bug. Let me elaborate a bit on this.

Obviously when you do want to reference the monad explicitly, then it's a bug. For example, in the draft paper that we're currently writing, we need to reference the monad explicitly, and we therefore need to extend the Markov categories formalism in order to facilitate this. We do so by assuming that there is a bijective correspondence between deterministic morphisms $X \to PY$ and general morphisms $X \to Y$ for all $X$ and $Y$ . In particular, we obtain a map $PX \to X$ to be interpreted as sampling from a distribution (which sounds similar to your $E$ ), and a deterministic map $X \to PX$ which plays the role of assigning to every point the Dirac delta distribution at that point.

The bijective correspondence between deterministic $X \to PY$ and general $X \to Y$ then lives on top of the string diagrams and doesn't really interact with them. I'm pretty sure that there are ways to do better, in the sense that one can probably have a graphical calculus in which that correspondence is itself part of the graphical syntax in an intuitive way, for example by using things like functorial boxes. But I don't think that this has been worked out yet.

On the other hand, the fact that the string diagrams do not reference the monad can also be a feature, because there are many Markov categories which are not Kleisli categories of monads. The "plain" string diagrams can still be interpreted in these categories as well, and theorems on Markov categories are often still applicable and interesting. Hence, the inability to reference the monad is what gives us greater generality.

Sorry if this is too much information! So am I understanding correctly that your statement about $\omega|_C$ taking us one step up is precisely the possibility to interpret it either as a deterministic morphism (meaning essentially measurable map) $C \to PZ$ or as a generic Kleisli morphism $C \to Z$ ? If so, and if this distinction is important to you, then take a look at the string diagrams in the later sections of our draft; there, we use the bijective correspondence above by writing $f^\sharp : X \to PY$ for the deterministic counterpart of any $f : X \to Y$ . In the other direction, you recover $f : X \to Y$ from $f^\sharp : X \to PY$ by composing with $\mathrm{samp} : PY \to Y$ .

But if you're not sure whether you actually need to reference the monad, then it'll be better to simply work with the Kleisli morphism picture $\omega|_C : C \to Z$ . I think that you should stick with this picture until you arrive at a point where you have to reference $P$ explicitly, which may already have happened, and then switch to the more expressive formalism.

blackwell_act.pdf

Christoph Thies (Aug 03 2020 at 08:29):

Thank you for all this, @Tobias Fritz! I am very happy this makes sense to you.

When I wrote the ill defined $E\omega|_C$ with $E:PZ\rightarrow Z$ I wasn't thinking about the monad much. But I knew that the purpose of the output of $\omega|_C$ is to serve as input to compute a collective property depending on the collective's composition. In the convolution, this collective property then interacts with the individual property given by the output $Z$ of $s$ . It seems to me that if $\omega|_C$ connects directly to the convolution the latter computes the pairwise interaction between the individual property and the individual properties of the other individuals in the same collective. In general, however, this is not enough since the convolution is to combine the individual property with the collective property as a whole. That being said, for the case I have in mind this distinction is not necessary because $E:PZ\rightarrow Z$ seems to do the same as the 'implicit' summation(?).

With respect to your distinction between deterministic morphisms and Kleisli morphisms I am therefore quite convinced that $\omega|_C$ should be considered deterministic here. To an individual it assigns the distribution over $Z$ that represents the property distribution in the individual's collective. The map may become non-deterministic when individuals can belong to multiple collectives (overlapping collectives) but this should not be required for the present purposes.

I didn't have time to get an understanding of your draft (or the other paper you mentioned) yet but it makes me incredibly happy that these questions are relevant to you. I am very much looking forward to reading your draft in more detail.

PS: I'm not sure this helps but I think that the random variable $Z\otimes C$ is somewhat similar in spirit to the copower described in Jacobs (2017), 'Hyper Normalisation and Conditioning for Discrete Probability Distributions'. The fact that individuals are organised in collectives turns the full distribution over $Z$ into a hyperdistribution of collective distributions. The normalisations discussed in this paper are surely relevant for what I am trying to do but I don't know how exactly yet.

Tobias Fritz (Aug 03 2020 at 13:59):

You're welcome!

I don't know enough about your situation, and in particular about what individuals and collectives are, to follow your arguments in detail. But I get the impression that your $E$ is exactly what we call the sampling map, in which case I think that you should be able to work with $\omega|_C : C \to Z$ after all; because composing the deterministic $(\omega|_C)^\sharp : C \to PZ$ (in our notation) with the sampling map $PZ \to Z$ produces exactly its non-deterministic counterpart $C \to Z$ . Note that by virtue of being a Kleisli morphism, any morphism $C \to Z$ also assigns a distribution over $Z$ to every element of $C$ , since this is what Markov kernels (Kleisli morphisms of the Giry monad) do. This is exactly the same as what the deterministic counterpart $C \to PZ$ does; in other words, these two maps encode the same information, and the difference between them is merely syntactical. And composing with $E$ is exactly what takes you from the latter to the former.

Does this make sense to you? Apologies if this has already been obvious.

The connection with Jacobs's hypernormalization is also intriguing to me. I also have the impression that hypernormalization is a deep and somehow fundamental concept for probability. This raises the question of whether it can be implemented it within the Markov categories framework. I think that doing so will in particular require generalizing it to beyond the discrete case. I am now realizing that some recent additions to our draft seem to shed light on this, but it's a bit too preliminary now for me to say anything further.

Christoph Thies (Aug 03 2020 at 18:54):

Yes, that makes sense. I like the sampling map very much. Also, you're right that $E$ should be this map. It is much nicer than taking the average.

By individual and collective I mean the following: the population we're looking at is made up of units that each have a property described by $Z$ . Moreover, the units are organised into collectives that partition the population. The random variable $Z\otimes C$ denotes the property of a randomly drawn unit along with an 'identifier' of the collective that unit is part of. The map $(\omega|_C)^\sharp$ turns the identifier into the composition of the collective it denotes, given as distribution over $Z$ .

Christoph Thies (Aug 03 2020 at 20:55):

I have a manuscript I wrote last year and failed to publish so far. I gave up trying for now, and that's fine. In the manuscript I am trying to make the point I am aiming for with this discussion, but in plain language and along basic calculations. The value of the manuscript lies not in mathematical insights but, I hope, in the clarifications of certain concepts and methods in evolutionary biology. I am not sure how much sense it makes to someone not familiar with the particular research questions, nor am I sure about the correctness of the arguments. I do think, however, that the intuition behind these diagrams (Model2.jpg) is adequately described in the manuscript albeit in different terms.

I am contemplating for a while to post the manuscript on a preprint server. My supervisor and coauthor gave his permission. If you would like to read the manuscript I would be happy to post it.

Nathaniel Virgo (Aug 04 2020 at 03:36):

Tobias Fritz said:

The bijective correspondence between deterministic $X \to PY$ and general $X \to Y$ then lives on top of the string diagrams and doesn't really interact with them. I'm pretty sure that there are ways to do better, in the sense that one can probably have a graphical calculus in which that correspondence is itself part of the graphical syntax in an intuitive way, for example by using things like functorial boxes. But I don't think that this has been worked out yet.

I've been thinking about this sort of thing recently. I don't know how helpful this is for the discussion, but here's how I'd draw what you describe above. Note that I haven't worked out anything formally, this is just pictures. (Not that being pictures makes them informal, I just haven't gone through and worked out exactly what axioms everything should obey, and it's possible something ends up breaking the whole thing.)

Following your paper with Paolo, "Bimonoidal Structures of Probability Monads" we represent the object PX as a "tube" surrounding X, like this (X on the left, PX on the right)

image.png

Then a monad consists of families of morphisms

image.png

such that

image.png

These are just string diagram representations of the usual commutative square and triangle, asserting that the various ways of going from $PPX$ to $X$ and from $PX$ to $PX$ should be equal.

Since we're in a Markov category and want to distinguish between stochastic and deterministic morphisms, I'll draw stochastic morphisms with a curved edge and deterministic ones as square, like this

image.png

we want to make an equivalence between stochastic morphisms of the form $f:X\to Y$ and deterministic morphisms of the form $f':X \to PY$ , as drawn above.

Nathaniel Virgo (Aug 04 2020 at 03:36):

In a suitable class of Markov categories (I guess actually in any Kleisli category) we will have another family of canonical morphisms, the "sampling" operation $\varepsilon:PX\to X$ for each $X$ , which maps points in $PX$ stochastically to $X$ . Let's draw that like this:

image.png

Then we can simply write

image.png

which I find quite pleasing.

In the other direction we have

image.png

which could be taken as the definition of $f'$ . In symbols this says $f' = \eta{;}Pf$ . Although $f$ is a stochastic morphism, we can regard $Pf:PX\to PY$ as a deterministic map, given by the Chapman-Kolmogorov equation.

We should also have these equations for how $\varepsilon$ interacts with $\eta$ and $\mu$ :

image.png

They look like simplified versions of the monad laws, which makes some intuitive sense to me, because in the Kleisli category every object $X$ is really an object of the form $PX$ in the base category. So these are actually the monad laws, just with one level of application of $P$ removed. Because of this, I'd guess that all the stuff in the bimonoidal structures paper will also work in this context, but I haven't worked through it.

I've been using notation like this informally for a while. It seems to be quite useful, because it combines the convenience of Markov categories with the ability to consider distributions explicitly when needed.

Nathaniel Virgo (Aug 04 2020 at 04:17):

For more on the "tube diagram" notation there are a couple of blog posts by Joe Moeller, at https://joemathjoe.wordpress.com/2020/06/23/a-different-string-presentation-of-monads/ and https://joemathjoe.wordpress.com/2020/07/09/tube-diagrams-for-monoidal-monads/, as well as the paper by Tobias and Paolo, at https://arxiv.org/abs/1804.03527

Christoph Thies (Aug 04 2020 at 07:18):

Christoph Thies said:

... the intuition behind these diagrams (Model2.jpg) ...

I think I got this model all wrong. The map $*_{+_Z}(\text{id}_Z\otimes\text{samp}(\omega|_C)^\sharp):Z\otimes C\rightarrow Z$ seems to refer to how the variables are utilised within $s$ and $t$ . Ultimately, we're interested in a model of the map $T:PZ\rightarrow PZ$ . I'll have to think more about this.

Tobias Fritz (Aug 04 2020 at 12:16):

Yep, functorial boxes and shadings are great! What seems to be missing so far is a complete set of rules for how they interact with the monoidal structure and Markov category structure; if such a thing was available, then we'd certainly be using it already, and probably Christoph and some others would do so as well. So if you or someone else were to propose a complete string diagram calculus, say for affine symmetric monoidal monads on cartesian monoidal categories, then that would come in very useful! One thing to keep in mind is that the string diagrams in our bimonoidal structures paper are at the level of the original category, meaning that the diagrams depict everything at the level of deterministic morphisms, while in this thread we all seem to be using string diagrams in a Kleisli/Markov category.

Yes, Tomáš has also proposed to use a separate box style for deterministic morphisms. This could also be useful, but there are a couple of caveats that make me personally uncertain about whether it should really be done:

1) What if a morphism is not known to be deterministic a priori, but later on in the course of a proof turns gets shown to be deterministic? Does it then get denoted differently, and could that be confusing?

2) What if neither morphism in a given diagram is deterministic, but a certain composite or subdiagram is?

3) On a vaguely related note, in the work that we're currently doing on the comparison of statistical experiments, it's becoming increasingly clear that properties holding merely "almost surely" is something which comes up a lot, as do almost surely deterministic morphisms.

Perhaps there's a more elaborate notation to take care of the latter two points?

Tobias Fritz (Aug 04 2020 at 12:42):

@Christoph Thies, I can now follow the explanation of individuals and collectives and understand how it models population biology. I still think that $\omega|_C : C \to Z$ achieves the same thing as $\omega|_C^\sharp : C \to PZ$ . If you use the identifier of a collective as input to $\omega_C$ , then you simply get a random element of $Z$ as output, and if you use the same input many times, then you get different elements sampled from the corresponding distribution. That's why $\omega|_C$ and the composition $\mathrm{samp} \circ \omega|_C^\sharp$ are one and the same Kleisli morphism. Right? Of course this is not really specific to probability theory or Markov categories but part of the formalism of Kleisli categories in general.

Christoph Thies (Aug 04 2020 at 13:05):

I'll answer quickly because I have to go, please excuse mistakes. The reason I would like to think of the output of $\omega|_C$ as element of $PZ$ is that a sample that is taken subsequent to $\omega|_C$ has attached to it the distribution over $Z$ that characterises the collective the sample is part of. The next step (the convolution) is the interaction between a collective effect computed from the attached distribution and an individual effect computed from the property of the sample itself. If $Z=\R$ and the computations and interaction within $s$ are identities and addition, resp., the difference between $\omega|_C$ and $(\omega|_C)^\sharp$ might not matter.

Nathaniel Virgo (Aug 04 2020 at 13:23):

On having a special style for deterministic morphisms, I tend to use the square box for "known to be deterministic" and the rounded edge for "possibly stochastic." If a composite turned out to be deterministic, I'd just write it as something like

image.png

I see it more as a typographical convention than a formal thing - I find it makes the diagrams easier to read in my paper notes.

I keep thinking there should be a better notational way to take care of "almost surely" in general, but I haven't hit on it yet.

On the Kleisli category versus the original category, what I was thinking this morning was that, if we want to, we can restrict the domain of P to its Kleisli category, and then we end up with a monad defined on the Kleisli category instead of the original category, and we should be able to use a similar graphical calculus for that. I speculated that a lot of the stuff from the bimonoidal structures paper will carry over to that context, but I agree that that work needs to be done.

Tobias Fritz (Aug 04 2020 at 13:41):

Christoph Thies said:

I'll answer quickly because I have to go, please excuse mistakes. The reason I would like to think of the output of $\omega|_C$ as element of $PZ$ is that a sample that is taken subsequent to $\omega|_C$ has attached to it the distribution over $Z$ that characterises the collective the sample is part of. The next step (the convolution) is the interaction between a collective effect computed from the attached distribution and an individual effect computed from the property of the sample itself. If $Z=\R$ and the computations and interaction within $s$ are identities and addition, resp., the difference between $\omega|_C$ and $(\omega|_C)^\sharp$ might not matter.

Okay, great! If the collective effect computed from the distribution depends on the distribution in a nonlinear way, then I agree that $PZ$ will have to be used. Whereas if the effect depends on the distribution linearly, then it can be computed by sampling from the distribution first and then using the resulting element of $Z$ as input to the effect; because then the overall effect is precisely the one given by taking the expectation over all the samples, and the Kleisli composition takes care of the formation of that expectation for you.

I imagine that there are plenty of effects in population biology which depend on the distribution in a nonlinear way. And this is the case in your situation? For example, I guess a diverse population has higher fitness than a uniform one, so that the fitness is a nonlinear function of the distribution? Is this more or less right? (Apologies if I'm using the terms incorrectly; I know that fitness usually refers to individuals, so perhaps I should be referring to something like adaptability at the population level when trying to express the advantage of diversity?)

Tobias Fritz (Aug 04 2020 at 14:40):

Nathaniel Virgo said:

On having a special style for deterministic morphisms, I tend to use the square box for "known to be deterministic" and the rounded edge for "possibly stochastic." If a composite turned out to be deterministic, I'd just write it as something like

image.png

I see it more as a typographical convention than a formal thing - I find it makes the diagrams easier to read.

Cool. So then in the situation of the following statement in Infinite products and zero-one laws in categorical probability,

Lemma5.2.png

would you keep the phrase " $sp$ is deterministic" as it is, since expressing it string-diagrammatically would not simplify anything, and use a separate notation for deterministic morphisms only when it can clearly help the reader? That sounds like something worth considering.

On the Kleisli category versus the original category, what I was thinking this morning was that, if we want to, we can restrict the domain of P to its Kleisli category, and then we end up with a monad defined on the Kleisli category instead of the original category, and we should be able to use a similar graphical calculus for that. I speculated that a lot of the stuff from the bimonoidal structures paper will carry over to that context, but I agree that that work needs to be done.

Right. One thing to be careful with is that a monad does usually not extend to a monad on its Kleisli category, as I've had to learn the hard way by being confused about it and then being corrected by my coauthors. The (only) thing that fails is the naturality of the unit! In the probability monad context, when you compose a non-deterministic Markov kernel $f : X \to Y$ with $\delta_Y : Y \to PY$ , then the composite $\delta_Y \circ f$ returns a random delta distribution on $Y$ ; but the other composite $Pf \circ \delta_X$ is actually deterministic, but its image is not contained in the delta distributions. The two coincide only after composing with the sampling map $PY \to Y$ .

Christoph Thies (Aug 04 2020 at 19:11):

Tobias Fritz said:

Okay, great! If the collective effect computed from the distribution depends on the distribution in a nonlinear way, then I agree that $PZ$ will have to be used. Whereas if the effect depends on the distribution linearly, then it can be computed by sampling from the distribution first and then using the resulting element of $Z$ as input to the effect; because then the overall effect is precisely the one given by taking the expectation over all the samples, and the Kleisli composition takes care of the formation of that expectation for you.

That seems correct to me.

Tobias Fritz said:

I imagine that there are plenty of effects in population biology which depend on the distribution in a nonlinear way.

Yes, higher order effects. For example, not only the units subject to causal processes evolve but also the units that constitute those processes.

Tobias Fritz said:

And this is the case in your situation?

For now I don't need this, linear is sufficient. My goal is to recreate the multilevel Price equation, an equation that formalises the biological process of selection, in category-theoretic terms. The Price equation is equivalent to a linear regression.

Tobias Fritz said:

For example, I guess a diverse population has higher fitness than a uniform one, so that the fitness is a nonlinear function of the distribution?

Yes, that would be an example where knowledge of the average is insufficient to determine fitness.

Tobias Fritz said:

Is this more or less right?

Yes, perfect!

Tobias Fritz said:

Apologies if I'm using the terms incorrectly

That's fine. Also, many terms are not clearly defined.

Tobias Fritz said:

I know that fitness usually refers to individuals, so perhaps I should be referring to something like adaptability at the population level when trying to express the advantage of diversity?

That's a far-reaching question. What replication could mean on higher levels and how it could be formalised is largely unclear. Let's think about this once we're done with selection!

Christoph Thies (Aug 05 2020 at 22:16):

I have a new version of the model. It's surely not without mistakes but it looks like a big step to me. It seems to do what I hoped for and more. The diagram shows the PA version of the equations above. In the diagram, $w_C:C\rightarrow C$ and $w_Z:Z\rightarrow Z$ denote collective and individual fitness, resp.; $E=\text{samp}\circ\omega|_C^\sharp:C\rightarrow Z$ .

In the CA version, the right leg just applies $w_Z$ and discards $C$ . The left and right leg of the diagram represent $s$ and $t$ , resp. ( $C$ is discarded in both legs). The intuition behind is the following: In $s$ , the collective phenotype (output of $E$ ) is evaluated in $w_C$ . $\omega|_C$ (sorry for the notation) converts the outcome to the corresponding distribution over $Z$ . In $t$ , the individual phenotype $Z$ interacts with the collective phenotype at the convolution. The output is evaluated in $w_Z$ to give the (relative?) distribution over $Z$ . The mixture $M$ combines the two copies of the system.

I'm sure something is wrong around $w_Z$ . I think it's to do with normalisation and the fact that the collective distributions are not full distributions (not summing up to one).

Christoph Thies (Aug 06 2020 at 12:40):

I have to admit I am somewhat overwhelmed by how much sense this makes, @Tobias Fritz. Everything fits together. I feel compelled to post my manuscript on BioRxiv now, as a draft. Do you think this might be a bad idea? I'd post it as I wrote it last year, without category theory.

Tobias Fritz (Aug 06 2020 at 13:32):

Christoph, I'm sorry if I personally am not competent to comment on a manuscript outside of my areas of expertise. Perhaps you can ask another mathematical biologist who has studied the Price equation for feedback? For example, Matteo Smerlak and his coauthors have worked on mathematically sophisticated approaches to evolution involving probabilistic dynamics, for example in Limiting fitness distributions in evolutionary dynamics. Perhaps they would be able to comment? In any case, please let us know if/when you post it, as I'd be curious to take a look and learn a bit more about it, even if I won't be able to assess its merits.

Christoph Thies (Aug 06 2020 at 13:39):

I'll just post it here then, for now: CAPA.pdf

Tobias Fritz (Aug 06 2020 at 23:21):

That looks like a really nice paper! I don't think that I'll be able to read it in detail, but the parts that I've read (in particular the introduction) are quite interesting and made good sense. So I very much hope that this will be of interest to mathematical biologists as well!

Christoph Thies (Aug 07 2020 at 05:11):

Thank you, @Tobias Fritz !

Christoph Thies (Aug 07 2020 at 08:21):

I would like to fix the left leg of the diagram above. With the incorrect composition $w_C\circ E$ ( with $E:C\rightarrow Z$ and $w_C:C\rightarrow C$ ) I am trying to say that collective selection $w_C$ acts on $C$ but is determined by the output of $E$ . How can I express this?

Tobias Fritz (Aug 07 2020 at 12:46):

Now I'm admittedly getting more confused. I thought that your $E$ was a morphism $PZ \to Z$ , namely the sampling map? But now the input of $E$ is $C$ . I also don't know what $w_C$ and $w_Z$ are.

Christoph Thies (Aug 07 2020 at 13:36):

I am sorry for the confusion, @Tobias Fritz. I didn't write this down correctly. Let's see the maps involved:

$\omega|_C:C\rightarrow Z\\ E=\text{samp}\circ\omega|_C^\sharp:C\rightarrow Z\\ *:Z\otimes Z\rightarrow Z\\ M:Z\otimes Z\rightarrow Z$

I was thinking of $w_C$ and $w_Z$ as somehow representing the part that is left to explain in the complete process. Everything else seems specified. We have $w_Z:Z\rightarrow Z$ and I thought also $w_C:C\rightarrow C$ . Now it seems $w_C$ needs another input to determine the mapping that acts on $C$ , like this Model4.png.

But then it would seem that the same is required for $w_Z$ : one input determines the function that acts on the other. And both inputs ~~are identical~~ belong to the same individual. That's nice. Does it make sense?

Christoph Thies (Aug 07 2020 at 13:57):

Like this: Model5.png

Where is my regression? :rolling_eyes:

Tobias Fritz (Aug 07 2020 at 14:33):

Well, as I've pointed out a number of times before, we have $\omega|_C = \mathrm{samp} \circ \omega|_C^\sharp$ , so it seems to me that this coincides with what you now denote $E$ .

So your $w_Z$ and $w_C$ are the same components of the model as the morphisms that you had previously denoted $s$ and $t$ ?

Christoph Thies (Aug 07 2020 at 14:54):

Tobias Fritz said:

Well, as I've pointed out a number of times before, we have $\omega|_C = \mathrm{samp} \circ \omega|_C^\sharp$ , so it seems to me that this coincides with what you now denote $E$ .

Yes. I think that's ok. In the additive case any random element of the associated collective will probably do. I was getting ahead of myself talking about functions of distributions.

So your $w_Z$ and $w_C$ are the same components of the model as the morphisms that you had previously denoted $s$ and $t$ ?

No, it's like this, I think: Model5.1.png

Tobias Fritz (Aug 07 2020 at 15:18):

Okay! Then I'm not sure why to use two different symbol to denote the same morphism, but otherwise it makes sense to me :smile:

Christoph Thies (Aug 07 2020 at 16:42):

Nice!

Tobias Fritz said:

I'm not sure why to use two different symbol to denote the same morphism

Which symbols are you referring to?

but otherwise it makes sense to me :smile:

That makes me very happy!

Tobias Fritz (Aug 07 2020 at 17:09):

Great! I thought that we had agreed that $E$ and $\omega|_C$ denote the same morphism because they're both equal to $\mathrm{samp} \circ \omega|_C^\sharp$ . That's what I've been referring to.

Christoph Thies (Aug 07 2020 at 17:17):

I see. Yes. Here's both versions: Model5-PA.png, Model5-CA.png.

Christoph Thies (Aug 07 2020 at 20:20):

I think my regression is not far away. Consider $w_Z$ in CA. Suppose $w_Z$ acts on the left input with the right input controlling the mapping. For a sample $z\in Z$ we therefore get a map $w_Z(-\otimes z):Z\rightarrow Z$ . Since our individuals breed true (no mutation, i.e., offspring cannot differ from their parents in phenotype $z$ ) and we have no migration (no influx of $z$ -values not previously present in the population) with the projection $\pi_Z:Z\otimes C\rightarrow Z$ we can assume

$w_Z(-\otimes z)\circ\pi_Z(\omega) \ll \pi_Z(\omega).$

Therefore we have a Radon-Nikodym derivative $\widehat{w_Z} = \frac{\text{d}(\pi_Z(\omega))}{\text{d}\omega}:Z\rightarrow \R$ . For $Z=\R$ we have $\widehat{w_Z}:\R\rightarrow\R$ , ready for regression!

Christoph Thies (Aug 07 2020 at 21:06):

Sorry, @Tobias Fritz, I messed it up completely! The RN derivative is given by

$\widehat{w_Z}=\frac{\text{d}(w_Z(-\otimes z)\circ\pi_Z(\omega))}{\text{d}(\pi_Z(\omega))}:Z\rightarrow\R.$

Does this make sense?

Christoph Thies (Aug 07 2020 at 21:22):

How convenient that the RN derivative automatically yields exponential behaviour of the frequencies in $\pi_Z(\omega)$ when iterated.

Tobias Fritz (Aug 08 2020 at 13:54):

Christoph, I'm afraid that I'll have to take a break from the discussion (for now) - I'm moving to Austria! And organizing things is now starting to keep me quite busy.

Christoph Thies (Aug 08 2020 at 18:43):

Yes, sure, Tobias. Those last days chatting with you were quite exciting for me. I apologise if I was not considerate towards your time. All the best for your move! Austria is very nice.

If I may ask, do you think you'll be around again anytime soon? I need to finish my PhD thesis before long and reporting the things we discussed here would be very useful for me. It seems to me I'm not far away but I need your help :see_no_evil:

Paolo Perrone (Sep 03 2020 at 23:02):

In case you are interested, here are all three videos by Prakash: https://www.youtube.com/playlist?list=PLaILTSnVfqtI6MDWQUqB2mIhx1USzXkj4

Alexander Gietelink Oldenziel (Nov 19 2020 at 09:44):

Tomáš Gonda said:

Does anyone know of a theorem in categorical probability that could be regarded as a categorical version of the Radon-Nikodym Theorem? I have been wondering about this a couple of times, but a short literature search never provided a result I'd be happy with.

I don't know if has already been mentioned in this thread ( I didn't read all of it) but Bunge& Funk describe a Topos-Theoretic Radon-Nikodym theorem in their book Singular Coverings of Toposes.

Peter Arndt (Nov 20 2020 at 11:49):

Hi @Alexander Gietelink Oldenziel, could you say which theorem in Bunge&Funk's book you mean?

Alexander Gietelink Oldenziel (Nov 20 2020 at 14:46):

Peter Arndt said:

Hi Alexander Gietelink Oldenziel, could you say which theorem in Bunge&Funk's book you mean?

Hi Peter! I was thinking of section 6.2, about inverting distributions.
There is another paper where Marta Bunge explicitly says it is an analog of Radon-Nikodym.
I spent some time thinking about analogies of conditional probability and sigma algebras in this context. We can talk a little about it if you want, though I didn't get very far.

Peter Arndt (Nov 23 2020 at 00:14):

Ah, wow, looks like quite a journey from classical Radon-Nikodym to that chapter!
Yes, I would love to talk about that, just need to find some time...

Christoph Thies (Nov 28 2020 at 14:48):

Christoph Thies said:

Here's both versions: Model5-PA.png, Model5-CA.png.

Hello,

I have been thinking more about the equations I tried to build before. Using diagrams like those by @Nathaniel Virgo above, the collective is now represented explicitly in terms of the monad.

CAPA_monad.png

Christoph Thies (Nov 30 2020 at 08:45):

In an experimental setting, individuals are organised into collectives that in turn make up the population. The equations describe an episode of selection that acts on both the individual and the collective phenotypes.

The element of $PPZ$ on the left comes about as follows, I think. A collection of collectives of individuals is given as a collection of distributions over $Z$ , the space of individual phenotypes, that is an element of a coproduct $\oplus PZ$ . The monad unit $\delta : PZ \rightarrow PPZ$ induces a map $\oplus\delta : \oplus PZ\rightarrow PPZ$ . This situates the multilevel Price equation as in Gardner, A., The genetical theory of multilevel selection , Journal of Evolutionary Biology, 2015, 28, 305-319, Equation (5) (the author considers the genetic value as phenotype), in the context of the diagram below.

Context.png

Christoph Thies (Dec 08 2020 at 14:34):

In the lower branch in the monad diagrams above, collective composition, i.e., the inner distributions, should remain unchanged. The lower branch therefore has a side branch that keeps the inner tube so that it can be restored after $w_C$ . This looks a little awkward. Is there a more elegant way to represent this invariance? Could the inner distributions tunnel through the box $w_C$ ? :caterpillar:

Christoph Thies (Mar 23 2021 at 19:09):

I made some progress on this and wonder if someone is interested or would have a look to point out mistakes.

Consider the probability monad $P$ and $Z\colon\mathsf{FinSet}$ . Then I'd like to write the two models sketched above as follows.
CAPAInMonads.png

Moreover, the map $w_I\colon PZ\to PZ$ (and, similarly, $w_P\colon PPZ\to PPZ$ ) satisfies the diagrams below.
MonadHomomorphism.png

The latter diagrams seem similar to those in the definition of morphisms of monads in nlab (https://ncatlab.org/nlab/show/monad, Section "The bicategory of monads") but I can't follow the description there. Is it correct to say that $w_I\colon (Z, P)\to (Z, P)$ is a morphism of monads with 1-cell $1_Z\colon Z\to Z$ (and a 2-cell that I cannot write but that seems to be the identity as well)?

John Baez (Mar 23 2021 at 20:09):

What would help you follow the definition in the nLab?

Paolo Perrone (Mar 23 2021 at 21:22):

A morphism of monads is first of all a natural transformation. Do you have such a map $w_I$ for all objects $Z$ , or just for one?

Christoph Thies (Mar 24 2021 at 01:44):

John Baez said:

What would help you follow the definition in the nLab?

I suppose I'd have to learn what exactly bicategories are. I dodged this so far as I am afraid they'll drag me in further. It seems to always make sense to think beyond.

John Baez (Mar 24 2021 at 01:47):

Okay, you don't need to know what a bicategory is. If you're trying to understand a morphism of monads, that doesn't matter much.

Christoph Thies (Mar 24 2021 at 02:05):

That's what I was hoping! Could you point me to a reference that describes morphisms of monads without bicategories?

John Baez (Mar 24 2021 at 02:08):

No. I'm sure one exists; I just don't know it. I would just look at the nLab page's definition of "morphism of monads", which does not require that you know about bicategories.

John Baez (Mar 24 2021 at 02:09):

Take that definition, and where they say "1-cell" read "functor". Where they say "2-cell" read "natural transformation"

John Baez (Mar 24 2021 at 02:10):

Where they say "monad in K" read "monad in Cat", i.e. plain old monad.

Christoph Thies (Mar 24 2021 at 02:10):

I'll try that. Thank you!

John Baez (Mar 24 2021 at 02:11):

I typed "monad morphism" into Google and instantly got this:

https://mathoverflow.net/questions/92093/functors-between-monads-what-are-these-really-called

John Baez (Mar 24 2021 at 02:11):

This is a guy who defines morphisms of monads without knowing what they're called.

John Baez (Mar 24 2021 at 02:12):

His "natural map" must be a natural transformation.

John Baez (Mar 24 2021 at 02:13):

With luck this definition will exactly match the nLab definition if you translate between the terminologies. With luck the key equations will agree. If you can get them to match up, you've probably got the right idea.

Christoph Thies (Mar 24 2021 at 03:11):

Paolo Perrone said:

A morphism of monads is first of all a natural transformation. Do you have such a map $w_I$ for all objects $Z$ , or just for one?

I have a map $w_I\colon PZ\to PZ$ for one $Z\colon \mathsf{FinSet}$ , but the construction works for any $Z\colon\mathsf{FinSet}$ .

To explain why these diagrams are relevant I'll describe $w_I$ that I'll call $w\colon PZ\to PZ$ from now (forget about $w_P$ as well). It is given by scaling the distribution pointwise and then normalising. With $w_Z\colon Z\to\mathbb{N}$ and normalisation $N\colon UZ\to PZ$ ( $UZ$ is the unnormalised monad and $\iota\colon PZ\to UZ$ is inclusion), $w$ is given by

$PZ\xrightarrow{\iota} UZ\xrightarrow{\cdot w_Z}UZ\xrightarrow{N}PZ.$

This construction makes $w$ satisfy the diagram below because normalisation reverses the scaling.
Unit.png

$w$ also satisfies the second diagram. To see this I did calculations similar to those you demonstrated in your recent talk on partial evaluations (https://www.youtube.com/watch?v=ynxfrlqr4I0).
Multiplication.png

Paolo Perrone (Mar 24 2021 at 08:40):

How exactly do you scale the distribution pointwise? Could you give an example?

Christoph Thies (Mar 24 2021 at 09:10):

Paolo Perrone said:

How exactly do you scale the distribution pointwise? Could you give an example?

For $Z = \{A,B\}, w_Z\colon Z\to\mathbb{N}, c_AA+c_BB\colon UZ$ with $c_A,c_B\colon\mathbb{R}_{\geq 0}$

$w(c_AA+c_BB) = w_Z(A) c_A A + w_Z(B) c_B B\colon UZ.$

Christoph Thies (Mar 24 2021 at 15:16):

But then comes normalisation and, as I see now, the last diagram is not generally satisfied. That's better, because the following two diagrams seem to say the same.
Multiplication.png
MultiplicationString.png

I would now draw the two versions of the process as follows.
CAPAInMonads2.png

In the right hand version (PA), it is necessary that $w_I$ does not slide out of the tube! In fact, the whole point of the distinction is that in CA, $w_I$ is applied across the metapopulation, and in PA, $w_I$ is applied within the populations.

Christoph Thies (Mar 24 2021 at 15:33):

I'm quite convinced about the diagram for the unit, though.
Unit.png
UnitString.png

It says (I think) that there is no mutation or other funny stuff happening in $w$ that creates novel things, i.e., that increases the support of the distribution.

Christoph Thies (Mar 24 2021 at 15:36):

What I still would like to say but don't know how is that $w_P$ leaves the inner expression unchanged.

Christoph Thies (Mar 25 2021 at 08:15):

I got myself into a bit of a pickle with the names that I'd like to sort out. Below the overview in which the processes on the right refine the process on the left.
CAPAOverview-1.png

More specifically, there are maps $w_Z\colon Z\to\mathbb{N}$ and $w_{PZ}\colon PZ\to\mathbb{N}$ such that $s_I$ is given by

$PZ\to UZ\xrightarrow{\cdot w_Z}UZ\xrightarrow{N}PZ$

and $s_P$ is given by

$PPZ\to UPZ\xrightarrow{\cdot w_{PZ}}UPZ\xrightarrow{N}PPZ.$

Moreover, $s_I\colon PZ\to PZ$ and $s_P\colon PPZ\to PPZ$ satisfy the diagrams below.
Unit1.png
Unit2.png

Christoph Thies (Mar 25 2021 at 09:01):

The latter equality seems to say that $s_P$ leaves the inner distributions unchanged.