Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: event: Categorical Probability and Statistics 2020 workshop

Topic: Categorical Radon-Nikodym


view this post on Zulip Tomáš Gonda (Jun 05 2020 at 22:40):

Does anyone know of a theorem in categorical probability that could be regarded as a categorical version of the Radon-Nikodym Theorem? I have been wondering about this a couple of times, but a short literature search never provided a result I'd be happy with.

view this post on Zulip Paolo Perrone (Jun 05 2020 at 22:53):

Prakash Panangaden gave a very nice talk at UCR where he used the theorem almost synthetically: https://categorytheory.zulipchat.com/#narrow/stream/229966-ACT.40UCR-seminar/topic/April.208th.3A.20Prakash.20Panangaden

view this post on Zulip Tobias Fritz (Jun 13 2020 at 19:56):

This question has been on my mind over the past week, and I'd now like to give a partial answer from the Markov categories perspective. Naively, one might think that Markov categories are not expressive enough to talk about densities and the Radon-Nikodym theorem. I think that this is largely true, but one can sidestep the appeal to densities and the Radon-Nikodym theorem to some extent. (And there may also be a possibility to build in densities natively into the framework, but I wouldn't yet know how to do that.)

But first, why are densities important? I can see two main reasons besides the Radon-Nikodym theorem:

The opposite variance of densities is related to the fact that Bayesian inversion can be described as a dagger functor, as explained in Remark 13.10 of my Markov cats paper. So while I don't know how to formulate (let alone prove) a general Radon-Nikodym theorem for Markov categories, there is a more particular construction which works in any Markov category with conditionals. Namely if f:XYf : X \to Y is any measurable map, μ\mu a probability measure on XX and ν\nu a probability measure on YY with νfμ\nu \ll f_\ast \mu, then we can form a new measure on XX given by f(dνdfμ)μf^\ast(\frac{d\nu}{df_\ast\mu})\cdot\mu. Here, ff^\ast denotes the pullback of functions by composition as above.

How can we construct this new measure using only the Markov categories structure? This is possible because that measure turns out to be given by the composition fμνf^\dag_\mu \nu, where fμ:YXf^\dag_\mu : Y \to X is a Bayesian inverse of ff with respect to μ\mu; showing this just requires some calculation, but I think is closely related to the construction of conditional expectations in terms of the Radon-Nikodym theorem. This measure fμνf^\dag_\mu \nu can be shown to be well-defined, i.e. independent of the particular choice of Bayesian inverse, as soon as νfμ\nu \ll f_\ast \mu holds; in a Markov cat, this means by definition that fμf_\ast\mu-a.s. equality of morphisms out of YY must imply ν\nu-a.s. equality. So in particular types of situations, we can get around the Radon-Nikodym theorem in a way which makes sense in any Markov category with conditionals.

This construction is a special case of letting a Radon-Nikodym derivative act on a measure by multiplication, and may thus seem a bit removed from the Radon-Nikodym theorem itself. This is true, but there still seem to be important applications of this special case. In particular, the abstract Fisher-Neyman factorization theorem (Theorem 14.5 of my paper) uses this type of construction (although this is not explained in the paper because I didn't know this at the time of writing.)

I'm not sure to what extent other applications of the Radon-Nikodym theorem can be sidestepped like this. It's hard to imagine that all of them can be. For example, one may ask whether for given measures μ,ν,ρ\mu,\nu,\rho on the same measurable space with μ,νρ\mu,\nu \ll \rho the new measure dμdρν\frac{d\mu}{d\rho} \cdot \nu would be similarly constructible only from the Markov category structure and conditionals only. I have a sketch of a proof that this is not possible in a generic Markov category with conditionals.

view this post on Zulip Sam Staton (Jun 14 2020 at 06:23):

Nice. Can you say a bit more about how you interpret νμ\nu\ll\mu in a Markov category? Sorry I may have missed it in your article.

In the category of s-finite kernels, the morphisms X1X\to 1 amount to measurable functions X[0,]X\to[0,\infty]. So I think one can easily talk about a Radon-Nikodym derivative of a measure 1μX1\xrightarrow{\mu} X with respect to a measure 1νX1\xrightarrow{\nu} X as a morphism f:X1f:X\to 1 such that μ=1νXf,id1XX\mu = 1\xrightarrow{\nu} X\xrightarrow{f,id} 1 \otimes X\cong X. (This looks especially easy in your diagrammatic notation.) But I haven't yet tried to phrase/prove the RN theorem in this abstract categorical setting of synthetic measure theory.

view this post on Zulip Tobias Fritz (Jun 14 2020 at 12:42):

Interesting! That sounds like a good reason to consider categories like Markov categories, but where instead the terminality of II is dropped (and replaced by the mere existence of a non-natural unit effect; I think @Arthur Parzygnat has been working with this definition).

Either way, νμ\nu \ll \mu for μ,ν:IX\mu,\nu : I \to X can be defined to mean the following: if f=μasgf =_{\mu-as} g for any two parallel morphisms ff and gg out of XX, then also f=νasgf =_{\nu-as} g. To see that this is equivalent to the standard definition in the Kleisli category of the Giry monad (on all measurable spaces), just take f=0f = 0 and gg the indicator function of a possible null set; the other direction seems to follow most easily by using the RN theorem. In fact, this synthetic definition makes sense for any two morphisms μ:AX\mu : A \to X and ν:BX\nu : B \to X with the same codomain XX, and I believe that the semantics of the condition in the Kleisli category of the Giry monad is then similar, amounting to μ(Sa)=0aν(Sb)=0b\mu(S|a) = 0 \:\forall a \: \Rightarrow \nu(S|b) = 0 \:\forall b. Perhaps this still holds in the category of s-finite kernels? (BTW, this definition was not in my paper yet, so you can't possibly have missed it.)

Now following your idea of how densities act on measures, we can easily express the RN theorem synthetically, although I'm "almost sure" that it can't be proven to hold from just the categorical structure and the existence of conditionals. However, when expressed diagrammatically like this, the RN theorem acquires the flavour of a factorization property vaguely reminiscent of the existence of conditionals. So I wonder:

Puzzle: Do the existence of conditionals and the RN theorem have a common generalization?

If so, their common generalization may be a very natural candidate axiom for synthetic measure/probability theory. Or am I completely off track here?

view this post on Zulip Sam Staton (Jun 15 2020 at 20:11):

Thanks @Tobias Fritz. Matthijs Vákár and Luke Ong worked out the theory of R-N derivatives and conditional probabilities for s-finite kernels here.

They give a general theorem for conditional probability that includes R-N derivatives as a special case (Theorem 13 and Remark 2). It would be great to write this in abstract categorical language. Is that the kind of thing you were thinking of in your puzzle?

But they use special forms of almost-surely and absolute continuities to deal with the possible infinities. Maybe this is your categorical formulation immediately translated to the category of s-finite kernels, but I am not yet sure.

view this post on Zulip Tobias Fritz (Jun 19 2020 at 13:00):

@Sam Staton Wow! Yes, that indeed looks like a very nice answer to my puzzle.

Just to make sure I understand: in the extra condition just after the footnote 7 of Ong and Vákár, I assume that xXx \in X should be zZz \in Z, right? I've been trying to see how this condition could possibly be implied by my synthetic definition of absolute continuity proposed above, when applied to the category of s-finite kernels. What I seem to be getting (with ϕ=id\phi = \mathrm{id} and Z=IZ = I for simplicity) is that μν\mu \ll \nu is equivalent to: for all measurable SS and TT, we have that ν(S)=ν(T)\nu(S) = \nu(T) implies μ(S)=μ(T)\mu(S) = \mu(T). Hence I do not know how to state the generalized disintegration theorem Of Ong and Vákár synthetically. Perhaps it's possible upon tweaking the definitions on either side a bit more.

view this post on Zulip Tomáš Gonda (Jun 20 2020 at 05:26):

That Vákár and Ong paper looks very nice indeed!

Does anyone know for a fact whether the category of s-finite kernels is (an unnormalized) Markov category in the sense of Tobias' paper?

Tobias Fritz said:

μν\mu \ll \nu is equivalent to: for all measurable SS and TT, we have that ν(S)=ν(T)\nu(S) = \nu(T) implies μ(S)=μ(T)\mu(S) = \mu(T).

I have no idea what you are referring to here to be honest, would you mind elaborating?

The assumption in Theorem 13 seems quite reasonable. As far as I understand it, events of both 0 measure and \infty-measure are part of a "sink of no return" (SONR) in this formalism - erasing all possible distinctions that could be made by future inferences (the only exception being that the \infty sink can leak into the 0 sink as 0=00 \cdot \infty = 0 here). The assumption just says that the SONR of ν\nu includes that of μ\mu, just as in the standard version of the result, right?

It seems to me that the most direct way to adapt the (aforementioned) synthetic definition of absolute continuity so that it coincides with this condition in Theorem 13 would be to impose an equality that copying \infty-measure events is the same as a product of \infty-measure events. However, this doesn't seem like a good solution in general. Somewhat orthogonal approach would be to equip the objects with a structure akin to that of localizable measurable spaces I guess.

view this post on Zulip Sam Staton (Jun 20 2020 at 13:34):

Hi @Tomáš Gonda, Yes, s-finite kernels do form an "unnormalized Markov category". I called it a commutative Freyd category in my paper.

Thanks @Tobias Fritz, I think you're right, the xXx\in X should be zZz\in Z. I would also be interested to see how you derive μν\mu\ll\nu and ν(S)=ν(T)\nu(S)=\nu(T) implies μ(S)=μ(T)\mu(S)=\mu(T) (for your definition of \ll in s-finite kernels, or in any unnormalized Markov category?) I tried to manipulate the definition but couldn't get very far.

Possibly an easier first thing to try is to consider bounded kernels, i.e. k:X×ΣY[0,)k:X\times \Sigma_Y\to [0,\infty) such that n.x.k(x,Y)<n\exists n.\,\forall x.\,k(x,Y)<n [note the order of quantifiers]. I think these also form an unnormalized Markov category. This is less useful because you don't have countable coproducts nor I guess can you expect to have Radon-Nikodym because some densities are unbounded (e.g. beta(0.5,0,5)). But it avoids the infinities for a moment.

view this post on Zulip Tobias Fritz (Jun 20 2020 at 16:38):

Tomáš Gonda said:

Tobias Fritz said:

μν\mu \ll \nu is equivalent to: for all measurable SS and TT, we have that ν(S)=ν(T)\nu(S) = \nu(T) implies μ(S)=μ(T)\mu(S) = \mu(T).

I have no idea what you are referring to here to be honest, would you mind elaborating?

Well, I didn't mean to make a definite claim there; my reasoning had been quite heuristic, and perhaps my indication of that wasn't clear enough. As I do things more carefully now, I unfortunately can no longer reproduce it. But fortunately, I now actually do recover the exact conditions of Theorem 13 of Ong and Vákár! Modulo one or two points that I'm not totally sure about, which I will mark separately as bullet points. Let me now go through the reasoning, again in the special case where Z=IZ = I and ϕ=id\phi = \mathrm{id}.

Assuming that this holds, it implies the obvious s-finite analogue of Lemma 4.2 of my paper. As in Example 13.3, this can then be applied to characterize almost sure equality: we have f=νgf =_{\nu} g if and only if

Uf(Vx)ν(dx)=Ug(Vx)ν(dx)\int_U f(V|x) \, \nu(dx) = \int_U g(V|x) \, \nu(dx)

for all measurable sets UU and VV in the respective spaces. Some fiddling with the universal quantification over VV, and using that ff and gg may land in the one-element space II specifically, shows that the synthetic absolute continuity μν\mu \ll \nu is equivalent to the following implication: for any two [0,][0,\infty]-valued functions ff and gg, if

Uf(x)ν(dx)=Ug(x)ν(dx)\int_U f(x) \, \nu(dx) = \int_U g(x) \, \nu(dx)

holds for all UU, then this implies the same property with μ\mu in place of ν\nu. Now we can interpret an equation like this as an equality of measures defined by densities ff and gg with respect to ν\nu. Thus by Theorem 9 of Ong and Vákár, the above equality is equivalent to the functions being ν\nu-almost everywhere equal on the finite part of ν\nu, and on the infinite part the set of points where one vanishes but the other one doesn't must have measure zero. Since it only matters where the functions are equal and where they vanish, it follows that it's enough to restrict to functions with values in, say, {0,1,2}\{0,1,2\}.

Indeed the first condition arises by considering f:=1Sf := 1_S and g:=0g := 0, and the second one by f:=1[ν][μ]f := 1_{\infty[\nu]\setminus\infty[\mu]} and g:=2fg := 2f.

view this post on Zulip Sam Staton (Jun 22 2020 at 07:03):

Thanks Tobias, This is exciting.

Tobias Fritz said:

Maybe I misunderstood you, but the measureable rectangles generate the product sigma algebra, so this is true for any measure?

I agree that this is the condition in Vákár-Ong, when Z=IZ=I and ϕ=id\phi=\mathrm{id}, but were you asking something different?

Also, do I understand you right:

and then the next step would be to check it's all still ok if we put ZZ and ϕ\phi back into theorem, to get the general form that includes disintegration?

PS I mentioned this thread to Luke and Matthijs.

view this post on Zulip Tobias Fritz (Jun 22 2020 at 09:21):

Sam Staton said:

Maybe I misunderstood you, but the measureable rectangles generate the product sigma algebra, so this is true for any measure?

Good, thanks. I wasn't sure if that applies because I had only ever worked with finite measures before.

Sam Staton said:

I agree that this is the condition in Vákár-Ong, when Z=IZ=I and ϕ=id\phi=\mathrm{id}, but were you asking something different?

Yes, I was asking about the final "some more fiddling" step in the proof, which I haven't done completely rigorously yet, but I know how it should go.

Sam Staton said:

Also, do I understand you right:

and then the next step would be to check it's all still ok if we put ZZ and ϕ\phi back into theorem, to get the general form that includes disintegration?

Yes! Let me know in case that you'd like me to work out that general case of the argument, which should be fairly straightforward. I would then also produce a more streamlined and completely rigorous version of the proof (one can simplify it a bit by proving the two directions of the equivalence separately). Of course, if you or someone else were to do this I'd be happy about that too.

Sam Staton said:

PS I mentioned this thread to Luke and Matthijs.

Great! If they're interested in seeing it or in chiming in, we can get a new invite link from a moderator.

view this post on Zulip Christoph Thies (Jul 30 2020 at 08:56):

Hello,

I have some questions regarding Radon-Nikodym derivatives in Kleisli categories. I hope this is the right place to ask. Also, please excuse inaccuracies, I am not an expert :innocent:.

Given the Giry monad P\mathcal{P} on, say, measurable spaces C\mathcal{C} suppose we have ZCZ\in\mathcal{C}, ωPZ\omega\in\mathcal{P}Z, and f,g:PZPZf,g:\mathcal{P}Z\rightarrow \mathcal{P}Z with fωωf\omega\ll\omega and gωωg\omega\ll\omega so that corresponding RN derivatives exist. Moreover, assume (Z,+Z)(Z,+_Z) is a monoid. With copyZ:ZZZ\text{copy}_Z : Z\rightarrow Z\otimes Z and the monoidal structure of the monad :PZPZP(ZZ)\nabla :\mathcal{P}Z\otimes\mathcal{P}Z\rightarrow\mathcal{P}(Z\otimes Z) given by the product distribution we have the map

T:PZPZPZfgPZPZP(ZZ)P+ZPZ.T:\mathcal{P}Z\rightarrow\mathcal{P}Z\otimes\mathcal{P}Z\xrightarrow{f\otimes g}\mathcal{P}Z\otimes\mathcal{P}Z\xrightarrow{\nabla}\mathcal{P}(Z\otimes Z)\xrightarrow{\mathcal{P}+_Z}\mathcal{P}Z.

Does the structure of TT allow conclusions regarding TωT\omega? For example, does it imply TωωT\omega\ll\omega? And if dTωdω\frac{\text{d}T\omega}{\text{d}\omega} exists, can it be represented in terms of dfωdω\frac{\text{d}f\omega}{\text{d}\omega} and dgωdω\frac{\text{d}g\omega}{\text{d}\omega}?

view this post on Zulip Tobias Fritz (Jul 30 2020 at 16:41):

Hi Christoph! That's interesting stuff. The composite map PZPZP(ZZ)P(Z)PZ \otimes PZ \stackrel{\nabla}{\longrightarrow} P(Z \otimes Z) \longrightarrow P(Z) is the convolution of probability measures on ZZ; for μ,νPZ\mu,\nu \in PZ, let's denote their convolution by μν\mu\ast\nu, which is the usual notation used by analysts. Then how about a slightly simpler version of the question like this: does μ,νω\mu,\nu \ll \omega imply that μνω\mu \ast \nu \ll \omega?

I think that the answer is no in general. For example, take Z=RZ = \mathbb{R} considered as a monoid under addition. Like this, what convolution models is exactly the sum of independent random variables: the distribution of a sum X+YX + Y of independent XX and YY is exactly the convolution of the distribution of XX with the distribution of YY. Now with μ=ν=ω\mu = \nu = \omega being the uniform measure on [1,1][-1,1], the convolution μν\mu \ast \nu will not be supported on [1,1][-1,1] only, but on [2,2][-2,2] instead, meaning that μν≪̸ω\mu \ast \nu \not\ll \omega.

But perhaps you have a specific situations in which additional properties hold?

view this post on Zulip Christoph Thies (Jul 30 2020 at 18:59):

Thank you for your reply, @Tobias Fritz!

Z=RZ=\R with addition is what I had in mind but perhaps there is more choice involved. I am asking because I would like to understand how the effects of two independent processes on the same system are combined. However, I am not sure at all I went about it in the right way. The idea is that if two processes act independently on the same system it might look like this:

PZPZPZfgPZPZ.\mathcal{P}Z\rightarrow\mathcal{P}Z\otimes\mathcal{P}Z\xrightarrow{f\otimes g}\mathcal{P}Z\otimes\mathcal{P}Z.

But then the two versions of the system must somehow be put together again and the addition on ZZ seemed a sensible way. What I would like to get for the Radon-Nikodym derivative, at least to first order, i.e., for the expectation value of ZZ, is

dTωdω=dfωdω+dgωdω,\frac{\text{d}T\omega}{\text{d}\omega} = \frac{\text{d}f\omega}{\text{d}\omega} + \frac{\text{d}g\omega}{\text{d}\omega},

which appears like an average of the two systems(?).

view this post on Zulip Tobias Fritz (Jul 30 2020 at 20:41):

Yes, using the convolution is a frequently used way to combine the effects of two independent processes on a system. It makes sense whenever the two effects are not only probabilistically independent, but in addition the effects should not interact.

On R\mathbb{R}, one will typically be working with ordinary probability density functions, which are just RN derivatives with respect to the usual (Lebesgue) measure on R\mathbb{R}. If we denote this Lebesgue measure by ω\omega, then there's a nice formula for expressing the density of a convolution:

d(μν)dω(x)=dμdω(y)dνdω(xy)dω(y).\frac{d(\mu\ast\nu)}{d\omega}(x) = \int \frac{d\mu}{d\omega}(y) \frac{d\nu}{d\omega}(x-y) \, d\omega(y).

Just adding the RN derivatives doesn't work, because the result is not even the RN derivative of a probability measure again: if you integrate, you'll see that it has a normalization of 1+1=21 + 1 = 2. But if you put a factor of 1/21/2 in front, then you get a well-defined probability measure again, which is known as the mixture.

Note that the mixture makes sense indenpendently of whether ZZ is a monoid or not, as it doesn't use the addition on ZZ. So it's quite a different kind of thing than the convolution, both mathematically and in terms of what it means.

view this post on Zulip Christoph Thies (Jul 31 2020 at 11:36):

Yes, that makes sense, thank you, @Tobias Fritz! I will try to be more specific. I'm also happy to explain why I am interested in this setup but will stick to the topic for now.

It seems to me that both involution and mixture are required though I don't know how exactly. But I can be more specific about the processes ff and gg (that I will call ss and tt from now). To understand their structure another random variable CC must be taken into account. Intuitively, CC partitions the population into collectives each with their own distribution over the range of ZZ: while the random variable ZZ denotes the property of a sample, CC denotes the "collective" from which the sample was drawn. Now two processes ss and tt act on the population concurrently, see setup.jpg. Moreover, while these two processes are meant to be somewhat independent there are two ways in which they may be assumed to be constrained (CA and PA), see CAPA_equations.pdf. In these diagrams, ωC:CZ\omega|_C:C\rightarrow Z is the conditional distribution as in Fritz (2019), "A synthetic approach to Markov kernels...". I will briefly explain the intuition behind these equations. Equation (a) (= equation (c)) says that ss doesn't change the composition of the collectives: the mapping ωC\omega|_C that gives the property distribution within a collective before ss occurs also gives the correct distribution afterwards (ωC=(sω)C)(\omega|_C = (s\omega)|_C). While the 'size' of a collective may change under ss its composition doesn't. Equation (b) says the same the other way around: tt changes the distribution over the property ZZ without regard to the collective. Finally, equation (d) says that tt doesn't affect CC and all changes to the distribution over ZZ occur internal to the collectives.

I am sorry if my remarks are more confusing than helpful. I have much more to say about these things and am happy to explain further.

view this post on Zulip Tobias Fritz (Jul 31 2020 at 12:22):

Those are nice diagrams! You may well be the first one to use Markov category diagrams for actual mathematical modelling :smile: (Although computer scientists like @Sam Staton similarly use probabilistic programming languages for mathematical modelling, and those are often even more powerful and expressive, while the Markov categories framework is less powerful but very general.)

We've already discussed a bit how to deal with convolution by using the monoid structure ZZZZ \otimes Z \to Z, which you can use as just a box like ss and tt in the diagram. Taking a mixture works quite similarly: it's also a morphism ZZZZ \otimes Z \to Z which merges the two inputs into one output, but it does something quite different: instead of adding up the inputs, it will randomly select one of its two inputs, one with probability λ\lambda and the other with 1λ1-\lambda for some parameter λ\lambda. It then uses that random selection as its output.

Of course, whether either or both of these should be used will depend on your application. For example, one possibility could be to employ both by using the convolution on the two ZZ's and the mixture on the two CC's, or the other way around.

view this post on Zulip Tobias Fritz (Jul 31 2020 at 12:40):

BTW the diagrams look sensible and interesting, so don't worry about them being confusing; they're very clear!

view this post on Zulip Christoph Thies (Jul 31 2020 at 13:00):

Thank you very much! I'll have to think about this.

view this post on Zulip Christoph Thies (Aug 01 2020 at 15:17):

I think I've got what I was hoping for. The attached diagram (Model.jpg) should work for any map E:PZZ\text{E}:\mathcal{P}Z\rightarrow Z and a convolution +Z:PZPZPZ*_{+_{Z}}:\mathcal{P}Z\otimes\mathcal{P}Z\rightarrow\mathcal{P}Z with respect to any monoid (perhaps abelian group) on ZZ. MM denotes the mixture @Tobias Fritz referred to above.

For E\text{E} I am thinking of the average for now. Admittedly, I am not sure what to think of the mixture. But all seems to fit nicely. To complete the puzzle I need to assign a statistical model for the Radon-Nikodym derivative w=dTωdω:ZRw=\frac{\text{d}T\omega}{\text{d}\omega}:Z\rightarrow\R of the complete map T:PZPZT:\mathcal{P}Z\rightarrow\mathcal{P}Z. For an individual, the regression is supposed to predict ww based on the property zz and the property of the individual's collective given by E\text{E} applied to the distribution over ZZ that is internal to this collective, ξ=EωCωZ:ZZ\xi = \text{E}\omega|_C\omega|_Z:Z\rightarrow Z.

The idea is that the CA equations from above correspond to the regression

w(z)=c1z+c2ξ(z)w(z)=c_1z+c_2\xi(z)

while the PA equations correspond to the same regression with transformed coordinates

w(z)=c1(zξ(z))+c2ξ(z).w(z)=c'_1(z-\xi(z))+c'_2\xi(z).

Does this make sense?

view this post on Zulip Tobias Fritz (Aug 01 2020 at 16:25):

Now there I'm a little confused. You have ω:IZC\omega : I \to Z \otimes C, and therefore ωC:CZ\omega_C : C \to Z, right? So then shouldn't the domain of EE also be ZZ?

view this post on Zulip Christoph Thies (Aug 01 2020 at 20:22):

Yes, something wrong there. I am not at all confident concerning my understanding of the monad action. In particular, I am not sure how the monad is visible in the string diagrams. Here, however, it seemed to me that the Kleisli morphism ωC\omega|_C takes us up one step in the monad that has no obvious corresponding step on the string connecting ss (or tt) directly to the convolution on the same side of the diagram. Since the convolution following ωC\omega|_C is pointwise we need to get down again and we should be free to choose the way.

view this post on Zulip Tobias Fritz (Aug 02 2020 at 13:51):

Concerning the question of how the monad is visible in the string diagrams: it is not! At least not in the "plain" string diagrams like they're usually used. Depending on what exactly you want to do, this can either be a feature or a bug. Let me elaborate a bit on this.

Obviously when you do want to reference the monad explicitly, then it's a bug. For example, in the draft paper that we're currently writing, we need to reference the monad explicitly, and we therefore need to extend the Markov categories formalism in order to facilitate this. We do so by assuming that there is a bijective correspondence between deterministic morphisms XPYX \to PY and general morphisms XYX \to Y for all XX and YY. In particular, we obtain a map PXXPX \to X to be interpreted as sampling from a distribution (which sounds similar to your EE), and a deterministic map XPXX \to PX which plays the role of assigning to every point the Dirac delta distribution at that point.

The bijective correspondence between deterministic XPYX \to PY and general XYX \to Y then lives on top of the string diagrams and doesn't really interact with them. I'm pretty sure that there are ways to do better, in the sense that one can probably have a graphical calculus in which that correspondence is itself part of the graphical syntax in an intuitive way, for example by using things like functorial boxes. But I don't think that this has been worked out yet.

On the other hand, the fact that the string diagrams do not reference the monad can also be a feature, because there are many Markov categories which are not Kleisli categories of monads. The "plain" string diagrams can still be interpreted in these categories as well, and theorems on Markov categories are often still applicable and interesting. Hence, the inability to reference the monad is what gives us greater generality.

Sorry if this is too much information! So am I understanding correctly that your statement about ωC\omega|_C taking us one step up is precisely the possibility to interpret it either as a deterministic morphism (meaning essentially measurable map) CPZC \to PZ or as a generic Kleisli morphism CZC \to Z? If so, and if this distinction is important to you, then take a look at the string diagrams in the later sections of our draft; there, we use the bijective correspondence above by writing f:XPYf^\sharp : X \to PY for the deterministic counterpart of any f:XYf : X \to Y. In the other direction, you recover f:XYf : X \to Y from f:XPYf^\sharp : X \to PY by composing with samp:PYY\mathrm{samp} : PY \to Y.

But if you're not sure whether you actually need to reference the monad, then it'll be better to simply work with the Kleisli morphism picture ωC:CZ\omega|_C : C \to Z. I think that you should stick with this picture until you arrive at a point where you have to reference PP explicitly, which may already have happened, and then switch to the more expressive formalism.

blackwell_act.pdf

view this post on Zulip Christoph Thies (Aug 03 2020 at 08:29):

Thank you for all this, @Tobias Fritz! I am very happy this makes sense to you.

When I wrote the ill defined EωCE\omega|_C with E:PZZE:PZ\rightarrow Z I wasn't thinking about the monad much. But I knew that the purpose of the output of ωC\omega|_C is to serve as input to compute a collective property depending on the collective's composition. In the convolution, this collective property then interacts with the individual property given by the output ZZ of ss. It seems to me that if ωC\omega|_C connects directly to the convolution the latter computes the pairwise interaction between the individual property and the individual properties of the other individuals in the same collective. In general, however, this is not enough since the convolution is to combine the individual property with the collective property as a whole. That being said, for the case I have in mind this distinction is not necessary because E:PZZE:PZ\rightarrow Z seems to do the same as the 'implicit' summation(?).

With respect to your distinction between deterministic morphisms and Kleisli morphisms I am therefore quite convinced that ωC\omega|_C should be considered deterministic here. To an individual it assigns the distribution over ZZ that represents the property distribution in the individual's collective. The map may become non-deterministic when individuals can belong to multiple collectives (overlapping collectives) but this should not be required for the present purposes.

I didn't have time to get an understanding of your draft (or the other paper you mentioned) yet but it makes me incredibly happy that these questions are relevant to you. I am very much looking forward to reading your draft in more detail.

PS: I'm not sure this helps but I think that the random variable ZCZ\otimes C is somewhat similar in spirit to the copower described in Jacobs (2017), 'Hyper Normalisation and Conditioning for Discrete Probability Distributions'. The fact that individuals are organised in collectives turns the full distribution over ZZ into a hyperdistribution of collective distributions. The normalisations discussed in this paper are surely relevant for what I am trying to do but I don't know how exactly yet.

view this post on Zulip Tobias Fritz (Aug 03 2020 at 13:59):

You're welcome!

I don't know enough about your situation, and in particular about what individuals and collectives are, to follow your arguments in detail. But I get the impression that your EE is exactly what we call the sampling map, in which case I think that you should be able to work with ωC:CZ\omega|_C : C \to Z after all; because composing the deterministic (ωC):CPZ(\omega|_C)^\sharp : C \to PZ (in our notation) with the sampling map PZZPZ \to Z produces exactly its non-deterministic counterpart CZC \to Z. Note that by virtue of being a Kleisli morphism, any morphism CZC \to Z also assigns a distribution over ZZ to every element of CC, since this is what Markov kernels (Kleisli morphisms of the Giry monad) do. This is exactly the same as what the deterministic counterpart CPZC \to PZ does; in other words, these two maps encode the same information, and the difference between them is merely syntactical. And composing with EE is exactly what takes you from the latter to the former.

Does this make sense to you? Apologies if this has already been obvious.

The connection with Jacobs's hypernormalization is also intriguing to me. I also have the impression that hypernormalization is a deep and somehow fundamental concept for probability. This raises the question of whether it can be implemented it within the Markov categories framework. I think that doing so will in particular require generalizing it to beyond the discrete case. I am now realizing that some recent additions to our draft seem to shed light on this, but it's a bit too preliminary now for me to say anything further.

view this post on Zulip Christoph Thies (Aug 03 2020 at 18:54):

Yes, that makes sense. I like the sampling map very much. Also, you're right that EE should be this map. It is much nicer than taking the average.

By individual and collective I mean the following: the population we're looking at is made up of units that each have a property described by ZZ. Moreover, the units are organised into collectives that partition the population. The random variable ZCZ\otimes C denotes the property of a randomly drawn unit along with an 'identifier' of the collective that unit is part of. The map (ωC)(\omega|_C)^\sharp turns the identifier into the composition of the collective it denotes, given as distribution over ZZ.

view this post on Zulip Christoph Thies (Aug 03 2020 at 20:55):

I have a manuscript I wrote last year and failed to publish so far. I gave up trying for now, and that's fine. In the manuscript I am trying to make the point I am aiming for with this discussion, but in plain language and along basic calculations. The value of the manuscript lies not in mathematical insights but, I hope, in the clarifications of certain concepts and methods in evolutionary biology. I am not sure how much sense it makes to someone not familiar with the particular research questions, nor am I sure about the correctness of the arguments. I do think, however, that the intuition behind these diagrams (Model2.jpg) is adequately described in the manuscript albeit in different terms.

I am contemplating for a while to post the manuscript on a preprint server. My supervisor and coauthor gave his permission. If you would like to read the manuscript I would be happy to post it.

view this post on Zulip Nathaniel Virgo (Aug 04 2020 at 03:36):

Tobias Fritz said:

The bijective correspondence between deterministic XPYX \to PY and general XYX \to Y then lives on top of the string diagrams and doesn't really interact with them. I'm pretty sure that there are ways to do better, in the sense that one can probably have a graphical calculus in which that correspondence is itself part of the graphical syntax in an intuitive way, for example by using things like functorial boxes. But I don't think that this has been worked out yet.

I've been thinking about this sort of thing recently. I don't know how helpful this is for the discussion, but here's how I'd draw what you describe above. Note that I haven't worked out anything formally, this is just pictures. (Not that being pictures makes them informal, I just haven't gone through and worked out exactly what axioms everything should obey, and it's possible something ends up breaking the whole thing.)

Following your paper with Paolo, "Bimonoidal Structures of Probability Monads" we represent the object PX as a "tube" surrounding X, like this (X on the left, PX on the right)

image.png

Then a monad consists of families of morphisms

image.png

such that

image.png

These are just string diagram representations of the usual commutative square and triangle, asserting that the various ways of going from PPXPPX to XX and from PXPX to PXPX should be equal.

Since we're in a Markov category and want to distinguish between stochastic and deterministic morphisms, I'll draw stochastic morphisms with a curved edge and deterministic ones as square, like this

image.png

we want to make an equivalence between stochastic morphisms of the form f:XYf:X\to Y and deterministic morphisms of the form f:XPYf':X \to PY, as drawn above.

view this post on Zulip Nathaniel Virgo (Aug 04 2020 at 03:36):

In a suitable class of Markov categories (I guess actually in any Kleisli category) we will have another family of canonical morphisms, the "sampling" operation ε:PXX\varepsilon:PX\to X for each XX, which maps points in PXPX stochastically to XX. Let's draw that like this:

image.png

Then we can simply write

image.png

which I find quite pleasing.

In the other direction we have

image.png

which could be taken as the definition of ff'. In symbols this says f=η;Pff' = \eta{;}Pf. Although ff is a stochastic morphism, we can regard Pf:PXPYPf:PX\to PY as a deterministic map, given by the Chapman-Kolmogorov equation.

We should also have these equations for how ε\varepsilon interacts with η\eta and μ\mu:

image.png

They look like simplified versions of the monad laws, which makes some intuitive sense to me, because in the Kleisli category every object XX is really an object of the form PXPX in the base category. So these are actually the monad laws, just with one level of application of PP removed. Because of this, I'd guess that all the stuff in the bimonoidal structures paper will also work in this context, but I haven't worked through it.

I've been using notation like this informally for a while. It seems to be quite useful, because it combines the convenience of Markov categories with the ability to consider distributions explicitly when needed.

view this post on Zulip Nathaniel Virgo (Aug 04 2020 at 04:17):

For more on the "tube diagram" notation there are a couple of blog posts by Joe Moeller, at https://joemathjoe.wordpress.com/2020/06/23/a-different-string-presentation-of-monads/ and https://joemathjoe.wordpress.com/2020/07/09/tube-diagrams-for-monoidal-monads/, as well as the paper by Tobias and Paolo, at https://arxiv.org/abs/1804.03527

view this post on Zulip Christoph Thies (Aug 04 2020 at 07:18):

Christoph Thies said:

... the intuition behind these diagrams (Model2.jpg) ...

I think I got this model all wrong. The map +Z(idZsamp(ωC)):ZCZ*_{+_Z}(\text{id}_Z\otimes\text{samp}(\omega|_C)^\sharp):Z\otimes C\rightarrow Z seems to refer to how the variables are utilised within ss and tt. Ultimately, we're interested in a model of the map T:PZPZT:PZ\rightarrow PZ. I'll have to think more about this.

view this post on Zulip Tobias Fritz (Aug 04 2020 at 12:16):

Yep, functorial boxes and shadings are great! What seems to be missing so far is a complete set of rules for how they interact with the monoidal structure and Markov category structure; if such a thing was available, then we'd certainly be using it already, and probably Christoph and some others would do so as well. So if you or someone else were to propose a complete string diagram calculus, say for affine symmetric monoidal monads on cartesian monoidal categories, then that would come in very useful! One thing to keep in mind is that the string diagrams in our bimonoidal structures paper are at the level of the original category, meaning that the diagrams depict everything at the level of deterministic morphisms, while in this thread we all seem to be using string diagrams in a Kleisli/Markov category.

Yes, Tomáš has also proposed to use a separate box style for deterministic morphisms. This could also be useful, but there are a couple of caveats that make me personally uncertain about whether it should really be done:

1) What if a morphism is not known to be deterministic a priori, but later on in the course of a proof turns gets shown to be deterministic? Does it then get denoted differently, and could that be confusing?

2) What if neither morphism in a given diagram is deterministic, but a certain composite or subdiagram is?

3) On a vaguely related note, in the work that we're currently doing on the comparison of statistical experiments, it's becoming increasingly clear that properties holding merely "almost surely" is something which comes up a lot, as do almost surely deterministic morphisms.

Perhaps there's a more elaborate notation to take care of the latter two points?

view this post on Zulip Tobias Fritz (Aug 04 2020 at 12:42):

@Christoph Thies, I can now follow the explanation of individuals and collectives and understand how it models population biology. I still think that ωC:CZ\omega|_C : C \to Z achieves the same thing as ωC:CPZ\omega|_C^\sharp : C \to PZ. If you use the identifier of a collective as input to ωC\omega_C, then you simply get a random element of ZZ as output, and if you use the same input many times, then you get different elements sampled from the corresponding distribution. That's why ωC\omega|_C and the composition sampωC\mathrm{samp} \circ \omega|_C^\sharp are one and the same Kleisli morphism. Right? Of course this is not really specific to probability theory or Markov categories but part of the formalism of Kleisli categories in general.

view this post on Zulip Christoph Thies (Aug 04 2020 at 13:05):

I'll answer quickly because I have to go, please excuse mistakes. The reason I would like to think of the output of ωC\omega|_C as element of PZPZ is that a sample that is taken subsequent to ωC\omega|_C has attached to it the distribution over ZZ that characterises the collective the sample is part of. The next step (the convolution) is the interaction between a collective effect computed from the attached distribution and an individual effect computed from the property of the sample itself. If Z=RZ=\R and the computations and interaction within ss are identities and addition, resp., the difference between ωC\omega|_C and (ωC)(\omega|_C)^\sharp might not matter.

view this post on Zulip Nathaniel Virgo (Aug 04 2020 at 13:23):

On having a special style for deterministic morphisms, I tend to use the square box for "known to be deterministic" and the rounded edge for "possibly stochastic." If a composite turned out to be deterministic, I'd just write it as something like

image.png

I see it more as a typographical convention than a formal thing - I find it makes the diagrams easier to read in my paper notes.

I keep thinking there should be a better notational way to take care of "almost surely" in general, but I haven't hit on it yet.

On the Kleisli category versus the original category, what I was thinking this morning was that, if we want to, we can restrict the domain of P to its Kleisli category, and then we end up with a monad defined on the Kleisli category instead of the original category, and we should be able to use a similar graphical calculus for that. I speculated that a lot of the stuff from the bimonoidal structures paper will carry over to that context, but I agree that that work needs to be done.

view this post on Zulip Tobias Fritz (Aug 04 2020 at 13:41):

Christoph Thies said:

I'll answer quickly because I have to go, please excuse mistakes. The reason I would like to think of the output of ωC\omega|_C as element of PZPZ is that a sample that is taken subsequent to ωC\omega|_C has attached to it the distribution over ZZ that characterises the collective the sample is part of. The next step (the convolution) is the interaction between a collective effect computed from the attached distribution and an individual effect computed from the property of the sample itself. If Z=RZ=\R and the computations and interaction within ss are identities and addition, resp., the difference between ωC\omega|_C and (ωC)(\omega|_C)^\sharp might not matter.

Okay, great! If the collective effect computed from the distribution depends on the distribution in a nonlinear way, then I agree that PZPZ will have to be used. Whereas if the effect depends on the distribution linearly, then it can be computed by sampling from the distribution first and then using the resulting element of ZZ as input to the effect; because then the overall effect is precisely the one given by taking the expectation over all the samples, and the Kleisli composition takes care of the formation of that expectation for you.

I imagine that there are plenty of effects in population biology which depend on the distribution in a nonlinear way. And this is the case in your situation? For example, I guess a diverse population has higher fitness than a uniform one, so that the fitness is a nonlinear function of the distribution? Is this more or less right? (Apologies if I'm using the terms incorrectly; I know that fitness usually refers to individuals, so perhaps I should be referring to something like adaptability at the population level when trying to express the advantage of diversity?)

view this post on Zulip Tobias Fritz (Aug 04 2020 at 14:40):

Nathaniel Virgo said:

On having a special style for deterministic morphisms, I tend to use the square box for "known to be deterministic" and the rounded edge for "possibly stochastic." If a composite turned out to be deterministic, I'd just write it as something like

image.png

I see it more as a typographical convention than a formal thing - I find it makes the diagrams easier to read.

Cool. So then in the situation of the following statement in Infinite products and zero-one laws in categorical probability,

Lemma5.2.png

would you keep the phrase "spsp is deterministic" as it is, since expressing it string-diagrammatically would not simplify anything, and use a separate notation for deterministic morphisms only when it can clearly help the reader? That sounds like something worth considering.

On the Kleisli category versus the original category, what I was thinking this morning was that, if we want to, we can restrict the domain of P to its Kleisli category, and then we end up with a monad defined on the Kleisli category instead of the original category, and we should be able to use a similar graphical calculus for that. I speculated that a lot of the stuff from the bimonoidal structures paper will carry over to that context, but I agree that that work needs to be done.

Right. One thing to be careful with is that a monad does usually not extend to a monad on its Kleisli category, as I've had to learn the hard way by being confused about it and then being corrected by my coauthors. The (only) thing that fails is the naturality of the unit! In the probability monad context, when you compose a non-deterministic Markov kernel f:XYf : X \to Y with δY:YPY\delta_Y : Y \to PY, then the composite δYf\delta_Y \circ f returns a random delta distribution on YY; but the other composite PfδXPf \circ \delta_X is actually deterministic, but its image is not contained in the delta distributions. The two coincide only after composing with the sampling map PYYPY \to Y.

view this post on Zulip Christoph Thies (Aug 04 2020 at 19:11):

Tobias Fritz said:

Okay, great! If the collective effect computed from the distribution depends on the distribution in a nonlinear way, then I agree that PZPZ will have to be used. Whereas if the effect depends on the distribution linearly, then it can be computed by sampling from the distribution first and then using the resulting element of ZZ as input to the effect; because then the overall effect is precisely the one given by taking the expectation over all the samples, and the Kleisli composition takes care of the formation of that expectation for you.

That seems correct to me.

Tobias Fritz said:

I imagine that there are plenty of effects in population biology which depend on the distribution in a nonlinear way.

Yes, higher order effects. For example, not only the units subject to causal processes evolve but also the units that constitute those processes.

Tobias Fritz said:

And this is the case in your situation?

For now I don't need this, linear is sufficient. My goal is to recreate the multilevel Price equation, an equation that formalises the biological process of selection, in category-theoretic terms. The Price equation is equivalent to a linear regression.

Tobias Fritz said:

For example, I guess a diverse population has higher fitness than a uniform one, so that the fitness is a nonlinear function of the distribution?

Yes, that would be an example where knowledge of the average is insufficient to determine fitness.

Tobias Fritz said:

Is this more or less right?

Yes, perfect!

Tobias Fritz said:

Apologies if I'm using the terms incorrectly

That's fine. Also, many terms are not clearly defined.

Tobias Fritz said:

I know that fitness usually refers to individuals, so perhaps I should be referring to something like adaptability at the population level when trying to express the advantage of diversity?

That's a far-reaching question. What replication could mean on higher levels and how it could be formalised is largely unclear. Let's think about this once we're done with selection!

view this post on Zulip Christoph Thies (Aug 05 2020 at 22:16):

I have a new version of the model. It's surely not without mistakes but it looks like a big step to me. It seems to do what I hoped for and more. The diagram shows the PA version of the equations above. In the diagram, wC:CCw_C:C\rightarrow C and wZ:ZZw_Z:Z\rightarrow Z denote collective and individual fitness, resp.; E=sampωC:CZE=\text{samp}\circ\omega|_C^\sharp:C\rightarrow Z.

In the CA version, the right leg just applies wZw_Z and discards CC. The left and right leg of the diagram represent ss and tt, resp. (CC is discarded in both legs). The intuition behind is the following: In ss, the collective phenotype (output of EE) is evaluated in wCw_C. ωC\omega|_C (sorry for the notation) converts the outcome to the corresponding distribution over ZZ. In tt, the individual phenotype ZZ interacts with the collective phenotype at the convolution. The output is evaluated in wZw_Z to give the (relative?) distribution over ZZ. The mixture MM combines the two copies of the system.

I'm sure something is wrong around wZw_Z. I think it's to do with normalisation and the fact that the collective distributions are not full distributions (not summing up to one).

view this post on Zulip Christoph Thies (Aug 06 2020 at 12:40):

I have to admit I am somewhat overwhelmed by how much sense this makes, @Tobias Fritz. Everything fits together. I feel compelled to post my manuscript on BioRxiv now, as a draft. Do you think this might be a bad idea? I'd post it as I wrote it last year, without category theory.

view this post on Zulip Tobias Fritz (Aug 06 2020 at 13:32):

Christoph, I'm sorry if I personally am not competent to comment on a manuscript outside of my areas of expertise. Perhaps you can ask another mathematical biologist who has studied the Price equation for feedback? For example, Matteo Smerlak and his coauthors have worked on mathematically sophisticated approaches to evolution involving probabilistic dynamics, for example in Limiting fitness distributions in evolutionary dynamics. Perhaps they would be able to comment? In any case, please let us know if/when you post it, as I'd be curious to take a look and learn a bit more about it, even if I won't be able to assess its merits.

view this post on Zulip Christoph Thies (Aug 06 2020 at 13:39):

I'll just post it here then, for now: CAPA.pdf

view this post on Zulip Tobias Fritz (Aug 06 2020 at 23:21):

That looks like a really nice paper! I don't think that I'll be able to read it in detail, but the parts that I've read (in particular the introduction) are quite interesting and made good sense. So I very much hope that this will be of interest to mathematical biologists as well!

view this post on Zulip Christoph Thies (Aug 07 2020 at 05:11):

Thank you, @Tobias Fritz !

view this post on Zulip Christoph Thies (Aug 07 2020 at 08:21):

I would like to fix the left leg of the diagram above. With the incorrect composition wCEw_C\circ E ( with E:CZE:C\rightarrow Z and wC:CCw_C:C\rightarrow C) I am trying to say that collective selection wCw_C acts on CC but is determined by the output of EE. How can I express this?

view this post on Zulip Tobias Fritz (Aug 07 2020 at 12:46):

Now I'm admittedly getting more confused. I thought that your EE was a morphism PZZPZ \to Z, namely the sampling map? But now the input of EE is CC. I also don't know what wCw_C and wZw_Z are.

view this post on Zulip Christoph Thies (Aug 07 2020 at 13:36):

I am sorry for the confusion, @Tobias Fritz. I didn't write this down correctly. Let's see the maps involved:

ωC:CZE=sampωC:CZ:ZZZM:ZZZ \omega|_C:C\rightarrow Z\\ E=\text{samp}\circ\omega|_C^\sharp:C\rightarrow Z\\ *:Z\otimes Z\rightarrow Z\\ M:Z\otimes Z\rightarrow Z

I was thinking of wCw_C and wZw_Z as somehow representing the part that is left to explain in the complete process. Everything else seems specified. We have wZ:ZZw_Z:Z\rightarrow Z and I thought also wC:CCw_C:C\rightarrow C. Now it seems wCw_C needs another input to determine the mapping that acts on CC, like this Model4.png.

But then it would seem that the same is required for wZw_Z: one input determines the function that acts on the other. And both inputs are identical belong to the same individual. That's nice. Does it make sense?

view this post on Zulip Christoph Thies (Aug 07 2020 at 13:57):

Like this: Model5.png

Where is my regression? :rolling_eyes:

view this post on Zulip Tobias Fritz (Aug 07 2020 at 14:33):

Well, as I've pointed out a number of times before, we have ωC=sampωC\omega|_C = \mathrm{samp} \circ \omega|_C^\sharp, so it seems to me that this coincides with what you now denote EE.

So your wZw_Z and wCw_C are the same components of the model as the morphisms that you had previously denoted ss and tt?

view this post on Zulip Christoph Thies (Aug 07 2020 at 14:54):

Tobias Fritz said:

Well, as I've pointed out a number of times before, we have ωC=sampωC\omega|_C = \mathrm{samp} \circ \omega|_C^\sharp, so it seems to me that this coincides with what you now denote EE.

Yes. I think that's ok. In the additive case any random element of the associated collective will probably do. I was getting ahead of myself talking about functions of distributions.

So your wZw_Z and wCw_C are the same components of the model as the morphisms that you had previously denoted ss and tt?

No, it's like this, I think: Model5.1.png

view this post on Zulip Tobias Fritz (Aug 07 2020 at 15:18):

Okay! Then I'm not sure why to use two different symbol to denote the same morphism, but otherwise it makes sense to me :smile:

view this post on Zulip Christoph Thies (Aug 07 2020 at 16:42):

Nice!

Tobias Fritz said:

I'm not sure why to use two different symbol to denote the same morphism

Which symbols are you referring to?

but otherwise it makes sense to me :smile:

That makes me very happy!

view this post on Zulip Tobias Fritz (Aug 07 2020 at 17:09):

Great! I thought that we had agreed that EE and ωC\omega|_C denote the same morphism because they're both equal to sampωC\mathrm{samp} \circ \omega|_C^\sharp. That's what I've been referring to.

view this post on Zulip Christoph Thies (Aug 07 2020 at 17:17):

I see. Yes. Here's both versions: Model5-PA.png, Model5-CA.png.

view this post on Zulip Christoph Thies (Aug 07 2020 at 20:20):

I think my regression is not far away. Consider wZw_Z in CA. Suppose wZw_Z acts on the left input with the right input controlling the mapping. For a sample zZz\in Z we therefore get a map wZ(z):ZZw_Z(-\otimes z):Z\rightarrow Z. Since our individuals breed true (no mutation, i.e., offspring cannot differ from their parents in phenotype zz) and we have no migration (no influx of zz-values not previously present in the population) with the projection πZ:ZCZ\pi_Z:Z\otimes C\rightarrow Z we can assume

wZ(z)πZ(ω)πZ(ω).w_Z(-\otimes z)\circ\pi_Z(\omega) \ll \pi_Z(\omega).

Therefore we have a Radon-Nikodym derivative wZ^=d(πZ(ω))dω:ZR\widehat{w_Z} = \frac{\text{d}(\pi_Z(\omega))}{\text{d}\omega}:Z\rightarrow \R. For Z=RZ=\R we have wZ^:RR\widehat{w_Z}:\R\rightarrow\R, ready for regression!

view this post on Zulip Christoph Thies (Aug 07 2020 at 21:06):

Sorry, @Tobias Fritz, I messed it up completely! The RN derivative is given by

wZ^=d(wZ(z)πZ(ω))d(πZ(ω)):ZR.\widehat{w_Z}=\frac{\text{d}(w_Z(-\otimes z)\circ\pi_Z(\omega))}{\text{d}(\pi_Z(\omega))}:Z\rightarrow\R.

Does this make sense?

view this post on Zulip Christoph Thies (Aug 07 2020 at 21:22):

How convenient that the RN derivative automatically yields exponential behaviour of the frequencies in πZ(ω)\pi_Z(\omega) when iterated.

view this post on Zulip Tobias Fritz (Aug 08 2020 at 13:54):

Christoph, I'm afraid that I'll have to take a break from the discussion (for now) ­- I'm moving to Austria! And organizing things is now starting to keep me quite busy.

view this post on Zulip Christoph Thies (Aug 08 2020 at 18:43):

Yes, sure, Tobias. Those last days chatting with you were quite exciting for me. I apologise if I was not considerate towards your time. All the best for your move! Austria is very nice.

If I may ask, do you think you'll be around again anytime soon? I need to finish my PhD thesis before long and reporting the things we discussed here would be very useful for me. It seems to me I'm not far away but I need your help :see_no_evil:

view this post on Zulip Paolo Perrone (Sep 03 2020 at 23:02):

In case you are interested, here are all three videos by Prakash: https://www.youtube.com/playlist?list=PLaILTSnVfqtI6MDWQUqB2mIhx1USzXkj4

view this post on Zulip Alexander Gietelink Oldenziel (Nov 19 2020 at 09:44):

Tomáš Gonda said:

Does anyone know of a theorem in categorical probability that could be regarded as a categorical version of the Radon-Nikodym Theorem? I have been wondering about this a couple of times, but a short literature search never provided a result I'd be happy with.

I don't know if has already been mentioned in this thread ( I didn't read all of it) but Bunge& Funk describe a Topos-Theoretic Radon-Nikodym theorem in their book Singular Coverings of Toposes.

view this post on Zulip Peter Arndt (Nov 20 2020 at 11:49):

Hi @Alexander Gietelink Oldenziel, could you say which theorem in Bunge&Funk's book you mean?

view this post on Zulip Alexander Gietelink Oldenziel (Nov 20 2020 at 14:46):

Peter Arndt said:

Hi Alexander Gietelink Oldenziel, could you say which theorem in Bunge&Funk's book you mean?

Hi Peter! I was thinking of section 6.2, about inverting distributions.
There is another paper where Marta Bunge explicitly says it is an analog of Radon-Nikodym.
I spent some time thinking about analogies of conditional probability and sigma algebras in this context. We can talk a little about it if you want, though I didn't get very far.

view this post on Zulip Peter Arndt (Nov 23 2020 at 00:14):

Ah, wow, looks like quite a journey from classical Radon-Nikodym to that chapter!
Yes, I would love to talk about that, just need to find some time...

view this post on Zulip Christoph Thies (Nov 28 2020 at 14:48):

Christoph Thies said:

Here's both versions: Model5-PA.png, Model5-CA.png.

Hello,

I have been thinking more about the equations I tried to build before. Using diagrams like those by @Nathaniel Virgo above, the collective is now represented explicitly in terms of the monad.

CAPA_monad.png

view this post on Zulip Christoph Thies (Nov 30 2020 at 08:45):

In an experimental setting, individuals are organised into collectives that in turn make up the population. The equations describe an episode of selection that acts on both the individual and the collective phenotypes.

The element of PPZPPZ on the left comes about as follows, I think. A collection of collectives of individuals is given as a collection of distributions over ZZ, the space of individual phenotypes, that is an element of a coproduct PZ\oplus PZ. The monad unit δ:PZPPZ\delta : PZ \rightarrow PPZ induces a map δ:PZPPZ\oplus\delta : \oplus PZ\rightarrow PPZ. This situates the multilevel Price equation as in Gardner, A., The genetical theory of multilevel selection , Journal of Evolutionary Biology, 2015, 28, 305-319, Equation (5) (the author considers the genetic value as phenotype), in the context of the diagram below.

Context.png

view this post on Zulip Christoph Thies (Dec 08 2020 at 14:34):

In the lower branch in the monad diagrams above, collective composition, i.e., the inner distributions, should remain unchanged. The lower branch therefore has a side branch that keeps the inner tube so that it can be restored after wCw_C. This looks a little awkward. Is there a more elegant way to represent this invariance? Could the inner distributions tunnel through the box wCw_C? :caterpillar:

view this post on Zulip Christoph Thies (Mar 23 2021 at 19:09):

I made some progress on this and wonder if someone is interested or would have a look to point out mistakes.

Consider the probability monad PP and Z ⁣:FinSetZ\colon\mathsf{FinSet}. Then I'd like to write the two models sketched above as follows.
CAPAInMonads.png

Moreover, the map wI ⁣:PZPZw_I\colon PZ\to PZ (and, similarly, wP ⁣:PPZPPZw_P\colon PPZ\to PPZ) satisfies the diagrams below.
MonadHomomorphism.png

The latter diagrams seem similar to those in the definition of morphisms of monads in nlab (https://ncatlab.org/nlab/show/monad, Section "The bicategory of monads") but I can't follow the description there. Is it correct to say that wI ⁣:(Z,P)(Z,P)w_I\colon (Z, P)\to (Z, P) is a morphism of monads with 1-cell 1Z ⁣:ZZ1_Z\colon Z\to Z (and a 2-cell that I cannot write but that seems to be the identity as well)?

view this post on Zulip John Baez (Mar 23 2021 at 20:09):

What would help you follow the definition in the nLab?

view this post on Zulip Paolo Perrone (Mar 23 2021 at 21:22):

A morphism of monads is first of all a natural transformation. Do you have such a map wIw_I for all objects ZZ, or just for one?

view this post on Zulip Christoph Thies (Mar 24 2021 at 01:44):

John Baez said:

What would help you follow the definition in the nLab?

I suppose I'd have to learn what exactly bicategories are. I dodged this so far as I am afraid they'll drag me in further. It seems to always make sense to think beyond.

view this post on Zulip John Baez (Mar 24 2021 at 01:47):

Okay, you don't need to know what a bicategory is. If you're trying to understand a morphism of monads, that doesn't matter much.

view this post on Zulip Christoph Thies (Mar 24 2021 at 02:05):

That's what I was hoping! Could you point me to a reference that describes morphisms of monads without bicategories?

view this post on Zulip John Baez (Mar 24 2021 at 02:08):

No. I'm sure one exists; I just don't know it. I would just look at the nLab page's definition of "morphism of monads", which does not require that you know about bicategories.

view this post on Zulip John Baez (Mar 24 2021 at 02:09):

Take that definition, and where they say "1-cell" read "functor". Where they say "2-cell" read "natural transformation"

view this post on Zulip John Baez (Mar 24 2021 at 02:10):

Where they say "monad in K" read "monad in Cat", i.e. plain old monad.

view this post on Zulip Christoph Thies (Mar 24 2021 at 02:10):

I'll try that. Thank you!

view this post on Zulip John Baez (Mar 24 2021 at 02:11):

I typed "monad morphism" into Google and instantly got this:

https://mathoverflow.net/questions/92093/functors-between-monads-what-are-these-really-called

view this post on Zulip John Baez (Mar 24 2021 at 02:11):

This is a guy who defines morphisms of monads without knowing what they're called.

view this post on Zulip John Baez (Mar 24 2021 at 02:12):

His "natural map" must be a natural transformation.

view this post on Zulip John Baez (Mar 24 2021 at 02:13):

With luck this definition will exactly match the nLab definition if you translate between the terminologies. With luck the key equations will agree. If you can get them to match up, you've probably got the right idea.

view this post on Zulip Christoph Thies (Mar 24 2021 at 03:11):

Paolo Perrone said:

A morphism of monads is first of all a natural transformation. Do you have such a map wIw_I for all objects ZZ, or just for one?

I have a map wI ⁣:PZPZw_I\colon PZ\to PZ for one Z ⁣:FinSetZ\colon \mathsf{FinSet}, but the construction works for any Z ⁣:FinSetZ\colon\mathsf{FinSet}.

To explain why these diagrams are relevant I'll describe wIw_I that I'll call w ⁣:PZPZw\colon PZ\to PZ from now (forget about wPw_P as well). It is given by scaling the distribution pointwise and then normalising. With wZ ⁣:ZNw_Z\colon Z\to\mathbb{N} and normalisation N ⁣:UZPZN\colon UZ\to PZ (UZUZ is the unnormalised monad and ι ⁣:PZUZ\iota\colon PZ\to UZ is inclusion), ww is given by

PZιUZwZUZNPZ.PZ\xrightarrow{\iota} UZ\xrightarrow{\cdot w_Z}UZ\xrightarrow{N}PZ.

This construction makes ww satisfy the diagram below because normalisation reverses the scaling.
Unit.png

ww also satisfies the second diagram. To see this I did calculations similar to those you demonstrated in your recent talk on partial evaluations (https://www.youtube.com/watch?v=ynxfrlqr4I0).
Multiplication.png

view this post on Zulip Paolo Perrone (Mar 24 2021 at 08:40):

How exactly do you scale the distribution pointwise? Could you give an example?

view this post on Zulip Christoph Thies (Mar 24 2021 at 09:10):

Paolo Perrone said:

How exactly do you scale the distribution pointwise? Could you give an example?

For Z={A,B},wZ ⁣:ZN,cAA+cBB ⁣:UZZ = \{A,B\}, w_Z\colon Z\to\mathbb{N}, c_AA+c_BB\colon UZ with cA,cB ⁣:R0c_A,c_B\colon\mathbb{R}_{\geq 0}

w(cAA+cBB)=wZ(A)cAA+wZ(B)cBB ⁣:UZ.w(c_AA+c_BB) = w_Z(A) c_A A + w_Z(B) c_B B\colon UZ.

view this post on Zulip Christoph Thies (Mar 24 2021 at 15:16):

But then comes normalisation and, as I see now, the last diagram is not generally satisfied. That's better, because the following two diagrams seem to say the same.
Multiplication.png
MultiplicationString.png

I would now draw the two versions of the process as follows.
CAPAInMonads2.png

In the right hand version (PA), it is necessary that wIw_I does not slide out of the tube! In fact, the whole point of the distinction is that in CA, wIw_I is applied across the metapopulation, and in PA, wIw_I is applied within the populations.

view this post on Zulip Christoph Thies (Mar 24 2021 at 15:33):

I'm quite convinced about the diagram for the unit, though.
Unit.png
UnitString.png

It says (I think) that there is no mutation or other funny stuff happening in ww that creates novel things, i.e., that increases the support of the distribution.

view this post on Zulip Christoph Thies (Mar 24 2021 at 15:36):

What I still would like to say but don't know how is that wPw_P leaves the inner expression unchanged.

view this post on Zulip Christoph Thies (Mar 25 2021 at 08:15):

I got myself into a bit of a pickle with the names that I'd like to sort out. Below the overview in which the processes on the right refine the process on the left.
CAPAOverview-1.png

More specifically, there are maps wZ ⁣:ZNw_Z\colon Z\to\mathbb{N} and wPZ ⁣:PZNw_{PZ}\colon PZ\to\mathbb{N} such that sIs_I is given by

PZUZwZUZNPZPZ\to UZ\xrightarrow{\cdot w_Z}UZ\xrightarrow{N}PZ

and sPs_P is given by

PPZUPZwPZUPZNPPZ.PPZ\to UPZ\xrightarrow{\cdot w_{PZ}}UPZ\xrightarrow{N}PPZ.

Moreover, sI ⁣:PZPZs_I\colon PZ\to PZ and sP ⁣:PPZPPZs_P\colon PPZ\to PPZ satisfy the diagrams below.
Unit1.png
Unit2.png

view this post on Zulip Christoph Thies (Mar 25 2021 at 09:01):

The latter equality seems to say that sPs_P leaves the inner distributions unchanged.