Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: learning: questions

Topic: Probability dual


view this post on Zulip Hakimi Rashid (Dec 07 2020 at 15:29):

I have been reading about markov category... from what I understand, it is a semicartesian monoidal category where objects have comonoid structure. So, they have morphisms like copy which is multiplication and delete which is the counit right?
1) seems to me it also has unit morphism ηX:IX\eta_X:I\to X, which gives probability distribution on X, just curious what would it be if we equipped objects with monoid structures? what would μ:XYXY\mu: X\otimes Y\to XY do?
2) regarding delete morphism DelX:XIDel_X:X\to I, does this morphism discard the whole data or just probability distribution on X, so you will get back the sample?
3)Do probability comes with any dual structure akin to dual vector space?

view this post on Zulip Javier Prieto (Dec 09 2020 at 10:42):

Indeed, unit morphisms exist and have that interpretation, see page 11 in Fritz19. If you wanna have a monoid structure mirroring the comonoid, multiplication should have type μ:XXX\mu : X \otimes X \to X for every object XX - I think you can't just pair any two objects because what would XYXY even mean?

view this post on Zulip Hakimi Rashid (Dec 09 2020 at 14:36):

I guess I was thinking along the line of defining statistical distance like KL divergence or maybe fisher information metric to compare between 2 distribution within markov category...

view this post on Zulip Javier Prieto (Dec 10 2020 at 12:29):

I assume you can define the KL divergence for any two parallel arrows p,q:IXp, q : I \to X but I don't think that's been done in this framework (yet)

view this post on Zulip Hakimi Rashid (Dec 11 2020 at 03:06):

Thank you for your reply. Would this work : we define ϵX:XI\epsilon_X:X\to I as a generalized version of DelXDel_X i.e, it maps X to scalars instead of delete just like counit in HilbHilb , ϵH:HC\epsilon_H:\mathbb{H}\to \mathbb{C}. Then, KL divergence: ϵμ(pq):II\epsilon\circ \mu \circ (p \otimes q): I\to I ?

Do we need multiplication μ:XXX\mu: X\otimes X\to X? Does it define joint probability? Would dropping μ\mu out of the above definition i.e ϵ(pq):II\epsilon \circ (p \otimes q): I\to I do the same job?

view this post on Zulip Javier Prieto (Dec 11 2020 at 08:53):

A construction along those lines might work in Hilb\mathrm{Hilb} because the monoidal unit is C\mathbb{C}, but in FinStoch\mathrm{FinStoch} for example the monoidal unit is the one-element set so you cannot "map to scalars" - the discard map is unique.

Joint probabilities on the pair (X,Y)(X,Y) are defined as arrows p:IXYp : I \to X \otimes Y - at least in FinStoch\mathrm{FinStoch}, but I believe this is true in any Markov category.

view this post on Zulip Tobias Fritz (Dec 11 2020 at 09:19):

Hi both! I was just about to write a similar reply, so thanks for having mentioned that already @Javier Prieto (I'm unfortunately too busy these days to spend much time here...)

Quite generally, I think that the best methodology in applied category theory is to look at and understand the mathematical structures in the applied context, in probability in this case, and then abstracting to categorical structures from there. So in the case of KL divergence, I don't think that a description of the form ϵμ(pq)\epsilon \circ \mu \circ (p \otimes q) will exist, since I don't know what the individual meaning of ϵ\epsilon and μ\mu in probability theory would be. At least one of them would have to be non-linear, since KL divergence is non-linear, which is at odds with the fact that morphisms in Markov categories (and variants thereof) are taken to be linear in the probabilities. Perhaps one can entertain categories with morphisms that act non-linearly, but then one needs to answer the question of what their probabilistic meaning and significance is.

One possible use for monoid structures μ:XXX\mu : X \otimes X \to X would be to axiomatize categories of monoid-valued random variables. For example with real-valued random variables, whose distributions are morphisms IRI \to \mathbb{R}, we may be interested in adding two of them. This amounts to composing their joint distribution with +:RRR+ : \mathbb{R} \otimes \mathbb{R} \to \mathbb{R}. I don't think that anyone has thought about such categories yet, and prior do doing so one should have an idea of what one is trying to achieve with it. (In many probability theory statements, like the laws of large numbers, one is interested in averaging a bunch of random variables rather than adding them. Although averaging and adding of real numbers are operations which differ only by a factor of 1/n1/n, I've come to regard their categorical generalizations as quite different kinds of beasts: addition is captured by monoid structures, while averaging has more to do with Representable Markov categories.

Hopefully some of this gives some additional insight beyond what Javier already said.

view this post on Zulip Hakimi Rashid (Dec 11 2020 at 12:42):

Thank you @Tobias Fritz and @Javier Prieto for your reply. Does this means that we cannot define any statistical distance at all in Markov category? Or, is there any statistical distance that is 'natural' and definable in Markov category?

If, let say we want to have categorical framework for statistical distance, do we need to start from scratch defining new category or we can add structures to Markov category by somehow enriching it in order to capture notions of statistical distance?

view this post on Zulip Javier Prieto (Dec 11 2020 at 15:31):

This is not an answer but it's pointing at one: you may find this stream and the links therein interesting. In particular, there is this paper

John C. Baez, Tobias Fritz, A Bayesian Characterization of Relative Entropy

in which relative entropy is defined as a functor from FinStat\mathrm{FinStat} to [0,][0, \infty]. The definition of FinStat\mathrm{FinStat} is a bit involved, but I think it embeds into FinStoch\mathrm{FinStoch} because the morphisms are essentially certain diagrams involving stochastic maps. I don't know if this functor can be extended/defined on FinStoch\mathrm{FinStoch}.

view this post on Zulip Tobias Fritz (Dec 11 2020 at 16:48):

I don't have a definite answer on the last question either, but very roughly: distances are real numbers, and finding any general categorical construction which outputs real numbers is tricky. It would have to involve some kind of limit. Assuming a suitable kind of enrichment would indeed be the easier and perhaps more sensible thing to do, and I would hope that many of the existing results for Markov categories can be generalized to the enriched case, and thereby apply in situations where one wants to talk about e.g. approximate equality of distributions.

On the other hand, although getting intrinsic categorical notions of distance is difficult, @Eigil Rischel has recently proposed an intrinsic topoloogy on the hom-sets of every Markov category (satisfying some mild conditions). That this is possible has been very surprising to me. It's current work in progress, so I'm afraid that I can't say more at this point, simply because we don't really know much more.

view this post on Zulip Hakimi Rashid (Dec 12 2020 at 05:08):

Thank you @Javier Prieto for sharing those resources and @Tobias Fritz for sharing the current progress on this issue. I really hope and pray that these will be sorted out and get published soon because most part of my current work can be captured by Markov category I think.

view this post on Zulip Hakimi Rashid (Dec 13 2020 at 02:28):

I am reading this paper https://arxiv.org/pdf/1709.00322.pdf 'Disintegration and Bayesian Inversion via String Diagrams' by cho and jacobs. In the paper, under the 7th section - Beyond causal channels, they mentioned about 'enlarging' the CD category which enables the notions of scalars and effects p:XIp: X\to I. What do these concepts mean intuitively in a probability setting? Can we use them somehow to define KL divergence?

view this post on Zulip Tobias Fritz (Dec 13 2020 at 08:59):

That just refers to including probability distributions and channels which are not normalized. When they say "causal", they're referring to a categorical formulation and generalization of the normalization of probability.

view this post on Zulip Hakimi Rashid (Dec 13 2020 at 14:41):

After reading your paper with @John Baez on relative entropy functor, I ve come to realize that any distance function is a function that kind of live in between 2 categories, if I understand it correctly. The input of the function live in one category for example markov category in the case of probability distribution and the output live in the realm of monoid of real number. Exception would be vector space because vectors and the output of inner product can live in the same category. Is this right?

view this post on Zulip Tobias Fritz (Dec 13 2020 at 15:13):

That's one way to do it, and it's one which seems to capture entropy and KL-divergence particularly accurately. But it's also possible to work with categories enriched in metric spaces, so that one has a given measure of distance between any two morphisms (between the same two objects). For example the category of channels has a canonical enrichment like this if one measures the distance between two channels by (the channel generalization of) the total variation distance. Also Wasserstein distance can be treated like this. Of course, the difference is that these distances are then additional structures on the category and not canonically determined from the categorical structure alone.

view this post on Zulip Hakimi Rashid (Dec 13 2020 at 15:17):

can we then just abstract this notion of distance and similarity and capture it in single categorical construct that can cover all distance measure?

view this post on Zulip Hakimi Rashid (Dec 13 2020 at 15:19):

would structured cospan do the job?

view this post on Zulip Tobias Fritz (Dec 13 2020 at 15:26):

Categories enriched in metric spaces are indeed a single categorical concept which covers a lot of distance measures. (KL divergence is one of the exceptions, due to the failure of the triangle inequality.) But this observation in itself doesn't really do much, since it's just a definition. It's about the same as saying that the definition of metric space covers all distance measures. That's true, but has little use in itself; it's more like a starting point for the development of a theory.

I don't see any relation to structured cospans, but perhaps someone who has worked with those can say more.

view this post on Zulip John Baez (Dec 14 2020 at 07:07):

Not me.

view this post on Zulip Hakimi Rashid (Dec 15 2020 at 15:02):

Owh... I was thinking about a construction akin to functor category [,][-,-] in which the slots are not fixed but can be inserted with various compatible categories. So, we can have cospan like so: i:X[,]Y:oi:X\to [-,-]\gets Y:o which results in functor category [X,Y][X,Y]. Is there such a thing? Or is it total nonsense?

view this post on Zulip Morgan Rogers (he/him) (Dec 15 2020 at 15:06):

It doesn't parse - that is, it doesn't type-check. [,][-,-] isn't a category if it doesn't have anything plugged into it, so what kind of morphisms should ii and oo be?

view this post on Zulip Hakimi Rashid (Dec 15 2020 at 15:11):

ok. it is total nonsense then. Is there any categorical construction that can achieve a similar goal? or the goal itself is not achievable?

view this post on Zulip Hakimi Rashid (Dec 15 2020 at 15:40):

What if i define a 2-category with the following diagram i:X[X,Y]Y:oi:X\to [X,Y]\gets Y:o then define a 2-functor to other 2-category. will it achieved the goal of having variable categories?

view this post on Zulip Morgan Rogers (he/him) (Dec 15 2020 at 15:58):

In general, we can think of the construction [,][-,-] as a 22-functor Catop×CatCat\mathbf{Cat}^{\mathrm{op}} \times \mathbf{Cat} \to \mathbf{Cat}.

view this post on Zulip Morgan Rogers (he/him) (Dec 15 2020 at 16:00):

Hakimi Rashid said:

What if i define a 2-category with the following diagram i:X[X,Y]Y:oi:X\to [X,Y]\gets Y:o then define a 2-functor to other 2-category. will it achieved the goal of having variable categories?

You likely can't do this with cospans in a nice way, since if XX is a non-empty category and YY is the empty category, then [X,Y][X,Y] is also empty, so no functor i:X[X,Y]i:X \to [X,Y] even exists..!

view this post on Zulip Hakimi Rashid (Dec 15 2020 at 16:03):

If we restrict ourselves to compatible categories, would the construction work?

view this post on Zulip Morgan Rogers (he/him) (Dec 15 2020 at 16:08):

What do you expect your functors ii and oo to do? More generally, do you have an aim that's not accomplished by understanding [,][-,-] itself as a 2-functor, without the cospan attached?

view this post on Zulip Morgan Rogers (he/him) (Dec 15 2020 at 16:08):

(That's not intended to be confrontational, I just want more of an insight into where you're going with this :grinning_face_with_smiling_eyes: )

view this post on Zulip Hakimi Rashid (Dec 15 2020 at 16:11):

Hakimi Rashid said:

After reading your paper with John Baez on relative entropy functor, I ve come to realize that any distance function is a function that kind of live in between 2 categories, if I understand it correctly. The input of the function live in one category for example markov category in the case of probability distribution and the output live in the realm of monoid of real number. Exception would be vector space because vectors and the output of inner product can live in the same category. Is this right?

The reason is I want to have a construction that generalize distance and divergence along this line of thinking.

view this post on Zulip Hakimi Rashid (Dec 15 2020 at 16:13):

The ii functor is the input and oo is the output.

view this post on Zulip Morgan Rogers (he/him) (Dec 15 2020 at 17:40):

In a typical distance situation you have a "pairing" ,\langle -,- \rangle which takes a pair of "things" XX and YY and outputs a "value" X,Y\langle X,Y \rangle. When XX and YY are members of spaces or categories, we might impose the restriction that the pairing is natural in its variables, which is to say that varying XX and YY gives a comparable variation in the value of X,Y\langle X,Y \rangle.
However, I don't know what situation you're imagining where there are mappings/transformations XX,YYX \rightarrow \langle X,Y \rangle \leftarrow Y; that's what I want to understand. You don't need these mappings in order for the pairing to make sense; you can just start with the pair (X,Y)(X,Y) consisting of the input and output, without requiring that those maps/morphisms exist.

view this post on Zulip Hakimi Rashid (Dec 16 2020 at 12:20):

https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf
This is because I wanted to follow the approach taken by these papers above. They defined KL divergence as a functor between 2 categories. I want to generalize this to other distance measures as well which might take different categories as input and output. I think what I should use is a category of functor categories right?

The pairing ,\langle -,- \rangle is a construction akin to inner product right? where both XX and YY are the inputs and X,Y\langle X, Y \rangle is the output. If I understand their paper correctly, XX should be the input category and YY is the output category and [X,Y][X,Y] is the fuctor category where KL divergence live.

view this post on Zulip Hakimi Rashid (Dec 16 2020 at 13:48):

So, if this correct, then we can describe inner product as the following: [Vect,Vect][Vect,Vect] or i:Vect[Vect,Vect]Vect:oi:Vect\to [Vect,Vect]\gets Vect:o. right?

view this post on Zulip Morgan Rogers (he/him) (Dec 16 2020 at 14:08):

No, inner products act on elements within vector spaces. So in that case, for a specific vector space VV, an inner product is a bilinear map V×VRV \times V \to \mathbb{R}. It's not an operation on the ordinary category of vector spaces.

view this post on Zulip Hakimi Rashid (Dec 16 2020 at 14:37):

but if we think of R\R as 1-dimensional vector space, can we somehow define the bilinear map using endofunctor? So, I was thinking of it like paraphrasing the definition in term of a more higher abstraction.

view this post on Zulip Morgan Rogers (he/him) (Dec 16 2020 at 17:57):

Hakimi Rashid said:

but if we think of R\R as 1-dimensional vector space, can we somehow define the bilinear map using endofunctor? So, I was thinking of it like paraphrasing the definition in term of a more higher abstraction.

But there can be many inner products on a given vector space (or, more generally, pairings between vector spaces), and linear maps do not necessarily respect them or canonically extend them, so the construction of inner products is not functorial.

view this post on Zulip Hakimi Rashid (Dec 17 2020 at 07:00):

I don't understand. Would you kindly elaborate more on inner products being not functorial? And also the issues of having many inner products.

Why can't the construction [Vect,R][Vect,\R] subsume inner product?

view this post on Zulip Morgan Rogers (he/him) (Dec 17 2020 at 11:02):

On making inner products functorial: the first problem is that there is no canonical way to equip a vector space VV with an inner product. One way to resolve this is to work instead with vector spaces equipped with a basis, since in that case there is a canonical inner product making that basis orthonormal. Next, one has the task of deciding which morphisms to choose: a morphism f:(V1,B1)(V2,B2)f:(V_1,\mathcal{B}_1) \to (V_2, \mathcal{B}_2) should consist of a linear map, but how should that interact with the basis, bearing in mind that we want the functor to send morphisms to transformations between inner product spaces? Well, one way that we can express that a linear map "respects the inner product" is that it should end up satisfying the equation f(x),f(y)2=x,y1\langle f(x), f(y) \rangle_2 = \langle x, y \rangle_1, where ,1,,2\langle -, - \rangle_1, \langle -, - \rangle_2 are the inner products on the respective spaces. In other words, we want the linear maps ff whose matrix with respect to the bases B1\mathcal{B}_1 and B2\mathcal{B}_2 has orthonormal columns in RB2\mathbb{R}^{|\mathcal{B}_2|}. The result is a category of vector spaces with bases on which we can functorially assign inner products, but it no longer looks very much like Vect\mathrm{Vect}, since for example there are no morphisms from higher dimensional spaces to lower dimensional spaces.

view this post on Zulip Morgan Rogers (he/him) (Dec 17 2020 at 11:05):

Note in particular that if you had wanted to view Vect\mathrm{Vect} as a category enriched over itself, this restricted category is no-longer Vect\mathrm{Vect}-enriched.

view this post on Zulip Morgan Rogers (he/him) (Dec 17 2020 at 11:12):

As for "the construction [Vect,R][\mathrm{Vect},\mathbb{R}]", we can make R\mathbb{R} into an ordinary category (by equipping it with morphisms representing the ordering) but this structure is not particularly compatible with its vector space structure, and in particular any functors from Vect\mathrm{Vect} into this category are constant, so reading that as a functor category won't give you anything resembling an inner product. We could instead interpret R\mathbb{R} as a one-object category enriched over the category of real vector spaces, but then [Vect,R][\mathrm{Vect},\mathbb{R}] consists of functors which necessarily map each vector space to that one object, and provides a linear map from each vector space of linear maps VWV \to W to R\mathbb{R}. That's something similar to an inner product, but still not quite what you're looking for, I think.

view this post on Zulip Hakimi Rashid (Dec 17 2020 at 11:48):

wow, thank you for your explanation. I need some time to unpack the whole thing. Looks like there's a lot of homework I need to do just to understand your answer.

view this post on Zulip Morgan Rogers (he/him) (Dec 17 2020 at 12:01):

When understanding CT, it helps to unpack all of the definitions, if nothing else to make sure that something you've written makes sense! Good luck :grinning:

view this post on Zulip Hakimi Rashid (Dec 17 2020 at 13:25):

[Mod] Morgan Rogers said:

On making inner products functorial: the first problem is that there is no canonical way to equip a vector space VV with an inner product. One way to resolve this is to work instead with vector spaces equipped with a basis, since in that case there is a canonical inner product making that basis orthonormal. Next, one has the task of deciding which morphisms to choose: a morphism f:(V1,B1)(V2,B2)f:(V_1,\mathcal{B}_1) \to (V_2, \mathcal{B}_2) should consist of a linear map, but how should that interact with the basis, bearing in mind that we want the functor to send morphisms to transformations between inner product spaces? Well, one way that we can express that a linear map "respects the inner product" is that it should end up satisfying the equation f(x),f(y)2=x,y1\langle f(x), f(y) \rangle_2 = \langle x, y \rangle_1, where ,1,,2\langle -, - \rangle_1, \langle -, - \rangle_2 are the inner products on the respective spaces. In other words, we want the linear maps ff whose matrix with respect to the bases B1\mathcal{B}_1 and B2\mathcal{B}_2 has orthonormal columns in RB2\mathbb{R}^{|\mathcal{B}_2|}. The result is a category of vector spaces with bases on which we can functorially assign inner products, but it no longer looks very much like Vect\mathrm{Vect}, since for example there are no morphisms from higher dimensional spaces to lower dimensional spaces.

If I understand this part, [Vect,Vect][Vect,Vect] that captures inner product will only be true for very limited subset of VectVect , right?

view this post on Zulip Hakimi Rashid (Dec 17 2020 at 14:07):

[Mod] Morgan Rogers said:

We could instead interpret R\mathbb{R} as a one-object category enriched over the category of real vector spaces, but then [Vect,R][\mathrm{Vect},\mathbb{R}] consists of functors which necessarily map each vector space to that one object, and provides a linear map from each vector space of linear maps VWV \to W to R\mathbb{R}. That's something similar to an inner product, but still not quite what you're looking for, I think.

I think this is quite similar to what they define for relative entropy functor in those papers, right?

view this post on Zulip Morgan Rogers (he/him) (Dec 17 2020 at 17:00):

Hakimi Rashid said:

I think this is quite similar to what they define for relative entropy functor in those papers, right?

No, their relative entropy is a specific functor; [Vect,R][\mathrm{Vect},\mathbb{R}] is a whole category of functors.

view this post on Zulip Morgan Rogers (he/him) (Dec 17 2020 at 17:04):

Hakimi Rashid said:

If I understand this part, [Vect,Vect][\mathrm{Vect},\mathrm{Vect}] that captures inner product will only be true for very limited subset of Vect\mathrm{Vect} , right?

It seems like this notation does not mean what you think it means. [Vect,Vect][\mathrm{Vect},\mathrm{Vect}] means either the category of ordinary endofunctors of Vect\mathrm{Vect}, or the category of Vect\mathrm{Vect}-enriched endofunctors of Vect\mathrm{Vect}. I don't think either of these things in any way "captures inner product".

view this post on Zulip John Baez (Dec 17 2020 at 20:35):

Yeah, [Vect, Vect] has nothing much to do with "inner product".

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 03:08):

Oh. I get it. If I try to generalize by simply going higher in the hierarchy of abstraction, then it will carry with it whole other baggage that is unnecessary and unrelated to the concepts I am trying to generalize.

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 03:16):

I thought I can generalize both and other statistical distances by following the pattern laid out in the papers. Has it ever been done before? Or it is something that is not possible?

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 13:16):

If we restrict ourselves to only statistical distances and exclude vector space and inner product then we can still capture many other statistical distances / similarity measures such as correlations, KL divergences, wasserstein distance and etc using functor category right?

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 13:25):

So, the construction i:X[X,Y]Y:oi:X\to [X,Y]\gets Y:o still applies. We just need to define compatible categories X and Y. Is this correct?

view this post on Zulip Morgan Rogers (he/him) (Dec 18 2020 at 15:22):

Hakimi Rashid said:

I thought I can generalize both and other statistical distances by following the pattern laid out in the papers.

But the thing you keep describing doesn't look anything like what appears in the papers you mentioned!!

view this post on Zulip Morgan Rogers (he/him) (Dec 18 2020 at 15:27):

In the papers, we have...

view this post on Zulip Morgan Rogers (he/him) (Dec 18 2020 at 15:28):

There is just one rather special functor in each situation. The notation [0,][0,\infty] is classical notation for "the positive real numbers, with infinity", rather than a functor category, which might have confused you?

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 15:38):

Oh no. I am really confused now. I need to dissect the whole thing again to pinpoint where I might go wrong. KL divergence in their paper is a functor right?

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 15:45):

A functor between the input category (in the paper Finstat and Sbstat) and the output category ('measure category').

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 15:46):

Since it is a specific functor it lives in functor category [input, output].

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 15:50):

hence if i want to define other measures(eg, correlation, wasserstein and etc) categorically, I can follow their 'formula' by defining a functor for each measure between specific category to compatible measure category.

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 15:51):

so far... anything wrong?

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 15:54):

every specific measure is a specific functor between specific categories. they live in different functor category.

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 15:57):

I thought if i can have a construction where I can vary the input category and the output category and thus functor category, I can generalised the definition, in which, relative entropy functor is one specific example.

view this post on Zulip Morgan Rogers (he/him) (Dec 18 2020 at 16:00):

Yes, that's all correct, but

  1. There are no functor ii and oo that one can meaningfully extract here, as far as I can tell.
  2. The properties of the specific functor are not presented categorically. That is, they show that their functor is the unique functor satisfying some properties, but these properties are not stated in terms of the functor category [FinStat,[0,]][\mathbf{FinStat}, [0,\infty]]. That's not to say that they couldn't be, it's just extra work you may have to do.

view this post on Zulip Morgan Rogers (he/him) (Dec 18 2020 at 16:03):

Rather than considering the whole functor category (which probably contains a lot more stuff than you need, yet on the other hand which it's hard to find anything much in!) a good approach would be to identify what features make their construction work. What makes a good "input category" or "output category"? What features make this approach work? In particular, which features are common with the other examples that you want to consider?

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 16:07):

By that, you mean it is possible to have a single categorical construct that can be used to define them all? or there is no such thing and thus I need to define them one by one?

view this post on Zulip Morgan Rogers (he/him) (Dec 18 2020 at 16:08):

There could well be a single construction that works for many of them; it's all about finding the right level of generality :grinning_face_with_smiling_eyes:

view this post on Zulip Hakimi Rashid (Dec 18 2020 at 16:10):

Ok. understood.

view this post on Zulip Morgan Rogers (he/him) (Dec 18 2020 at 16:18):

I appreciate that wasn't a very specific pointer to what you should do! Concretely, I'd say you have enough work ahead if you just focus on defining categories on which the measures make sense, categories which are suitable for measuring, and concrete examples of functors between them, so I'm just discouraging you from dealing with whole functor categories before you're ready. It might be that the properties featuring in the article (semicontinuous, convex linear, etc) have interesting translations into properties of the corresponding objects in [FinStat,[0,]][\mathbf{FinStat},[0,\infty]], and I personally would be very interested to discover if that is the case, but finding that out might not be the most direct route to the generalisation you're seeking.

view this post on Zulip Hakimi Rashid (Dec 19 2020 at 07:06):

Thank you for your tips and guidance. So, this has never been done before? Or if you know someone who has / is currently working on it? I actually was hoping that this has been done before so that I can use the result as part of my work.

view this post on Zulip Hakimi Rashid (Dec 31 2020 at 15:10):

Hi again. I was wondering... Can we define characteristic function of a random variable within Markov category?

view this post on Zulip Tobias Fritz (Dec 31 2020 at 15:25):

Briefly, I would say that we can't define characteristic functions in Markov categories just yet. Markov categories are much more general than plain old probability theory. Characteristic functions on the other hand are a concept rather specific to real-valued random variables in the traditional sense (although a very powerful concept for sure). That's why I think that there won't be something like a characteristic function in general Markov cats.

But it's entirely conceivable that there is something like a characteristic function for states in certain Markov categories on objects which "look like" the real numbers. I'm pretty sure though that nobody knows how to do this so far, so it's one of the many open questions.

It's also possible that there will be another concept for suitable Markov categories which is different from the characteristic function, but can replace the latter for some of its purposes, such as in the proof of the central limit theorem. But all of this is speculation at the moment, so just take it as a description of the scope of conceivable possibilities.

view this post on Zulip Hakimi Rashid (Dec 31 2020 at 22:49):

Ok. Thank you for your feedback @Tobias Fritz . Another question if I may... from what I have seen so far, deterministic maps are defined as maps that respect copying but in terms of string diagrams they are drawn the same as probabilistic ones, so, how can we differentiate between the two? is it just by looking at the context in which they occur ?

view this post on Zulip John Baez (Dec 31 2020 at 23:27):

It's easy to write down a string diagram that expresses the fact that a map respects copying. I guess if you want you can draw such maps in a different color or something.

view this post on Zulip Hakimi Rashid (Dec 31 2020 at 23:36):

Oh. I was thinking about doing something like that... but probably my question should be, why so far people haven't done that? Is it because you can just look at the context, so there is no need for differentiating them diagrammatically?

view this post on Zulip John Baez (Dec 31 2020 at 23:42):

my question should be, why so far people haven't done that?

I don't know why or even if they haven't, but go ahead and do it - if you're the first, maybe people will use your notation.

view this post on Zulip Hakimi Rashid (Dec 31 2020 at 23:49):

John Baez said:

I don't know why or even *if* they haven't, but go ahead and do it - if you're the first, maybe people will use your notation.

I haven't seen it... but my knowledge is very limited. Maybe @Tobias Fritz knows better.

view this post on Zulip Nathaniel Virgo (Jan 01 2021 at 04:47):

There was some brief discussion on this topic here. (I'm in favour of graphically distinguishing stochastic morphisms from deterministic ones, as I think it makes things easier to follow, but Tobias gave some reasonable points against it.)

view this post on Zulip Hakimi Rashid (Jan 01 2021 at 05:02):

Thank you @Nathaniel Virgo for the input.

view this post on Zulip Hakimi Rashid (Jan 01 2021 at 07:02):

I 'm trying to adapt the relative entropy construction following https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf using Markov category. In the papers, the morphism between objects in FinStatFinStat is a pair (f,s):(X,p)(Y,q)(f,s):(X,p)\to (Y,q) and RE functor send them to S(p,sq)S(p,s\circ q). Why do we need the ff morphism? can we do it with just ss leaving out ff?

view this post on Zulip Tobias Fritz (Jan 01 2021 at 07:52):

Quite generally, John's point is probably the best: if you invent a piece of notation and like it, then go ahead and use it! If it's useful and makes things more intelligible to others, then others will start using it too.

During the writing on our latest paper, we had indeed considered using a separate notation for deterministic morphisms. As in the discussion that @Nathaniel Virgo has linked to, there are advantages and disadvantages, and in the end we decided against doing it. But I'd be curious to see a paper which does it, in order to see how it pans out in practice.

view this post on Zulip Tobias Fritz (Jan 01 2021 at 08:13):

Hakimi Rashid said:

I 'm trying to adapt the relative entropy construction following https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf using Markov category. In the papers, the morphism between objects in FinStatFinStat is a pair (f,s):(X,p)(Y,q)(f,s):(X,p)\to (Y,q) and RE functor send them to S(p,sq)S(p,s\circ q). Why do we need the ff morphism? can we do it with just ss leaving out ff?

Interesting question! Perhaps there's a way to do without it, which would be very interesting to see. But in S(p,sq)S(p,s\circ q), the role played by the map ff is that it is measure-preserving between pp and qq, meaning that q=fpq = f \circ p, which you can regard as the definition of qq. Then you may just as well write S(p,sfp)S(p,s\circ f \circ p). So when putting it like this, it's actually the qq which is not needed!

view this post on Zulip Hakimi Rashid (Jan 01 2021 at 08:29):

I see. By the way, I'm also considering the kind of solution that would be brought about by following enriched Markov category similar to this thesis https://www.erischel.com/documents/mscthesis.pdf. Just that, I don't know which would be easier and generalize more to other statistical distances. Maybe to define other distances, we need to consider different categories on which to enrich Markov category?

view this post on Zulip John Baez (Jan 01 2021 at 17:44):

Hakimi Rashid said:

I 'm trying to adapt the relative entropy construction following https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf using Markov category. In the papers, the morphism between objects in FinStatFinStat is a pair (f,s):(X,p)(Y,q)(f,s):(X,p)\to (Y,q) and RE functor send them to S(p,sq)S(p,s\circ q). Why do we need the ff morphism? can we do it with just ss leaving out ff?

There are lots of different categories, good for different things. In the category that Tobias and I called FinStat\mathsf{FinStat} a morphism describes 1) how a state of the system being observed deterministically produces an observation, and 2) a recipe for guessing the state of the system from an observation. Part 1) is the measure-preserving function ff from XX to YY, and part 2) is the stochastic map ss from YY back to XX.

I think if you read the introduction to our paper you'll see that both of these are used to compute relative entropy. We give the formula, and it involves both ff and ss. We also explain what's going on.

view this post on Zulip Hakimi Rashid (Jan 01 2021 at 23:56):

Thank you for the input @John Baez . I've read the introduction part of that paper and if I understand it correctly, you have framed the definition on the ground of the scenario or example of how RE might be used. However, in the case of the setting of application I'm considering, I don't think I have the ff morphism. I want to compare 2 time series from sensor recordings XX and YY that have values in R\R.

view this post on Zulip Hakimi Rashid (Jan 02 2021 at 00:12):

So, in this setting, I want to know how different / similar they are to each other.

view this post on Zulip Hakimi Rashid (Jan 02 2021 at 00:19):

Tobias Fritz said:

Briefly, I would say that we can't define characteristic functions in Markov categories just yet. Markov categories are much more general than plain old probability theory. Characteristic functions on the other hand are a concept rather specific to real-valued random variables in the traditional sense (although a very powerful concept for sure). That's why I think that there won't be something like a characteristic function in general Markov cats.

But it's entirely conceivable that there is something like a characteristic function for states in certain Markov categories on objects which "look like" the real numbers. I'm pretty sure though that nobody knows how to do this so far, so it's one of the many open questions.

It's also possible that there will be another concept for suitable Markov categories which is different from the characteristic function, but can replace the latter for some of its purposes, such as in the proof of the central limit theorem. But all of this is speculation at the moment, so just take it as a description of the scope of conceivable possibilities.

I'm guessing the reason is because if we apply fourier transform to a random variables then we land in another category outside of Markov category. Is this true?

view this post on Zulip Tobias Fritz (Jan 02 2021 at 08:11):

Hakimi Rashid said:

I'm guessing the reason is because if we apply fourier transform to a random variables then we land in another category outside of Markov category. Is this true?

Not exactly. The reason is because there is no such thing as "apply the Fourier transform" in the first place. How would that be defined?

view this post on Zulip Hakimi Rashid (Jan 02 2021 at 12:14):

I was thinking of defining it as functor. Maybe I don't quite grasp the whole concept just yet.

view this post on Zulip Hakimi Rashid (Jan 10 2021 at 04:53):

Hi again. I'm still in the dark of why it does not make sense to define fourier transform of random variable as functor from Markov category to 'another' category CF category: FT:MarkovCFFT:Markov\to CF and its inverse as iFT:CFMarkoviFT:CF\to Markov. So, we can have monad as composition of the two T:iFTFTT: iFT\circ FT. Would you kindly walk me through it? Thank you.

view this post on Zulip Tobias Fritz (Jan 10 2021 at 08:55):

When you propose a new mathematical idea, then you need to explain why you think that it does make sense, and this seems to be missing here. In other words, what does your proposal have to do with characteristic functions or the Fourier transform at all? Which categories and which functors do you need to pick in order to obtain a categorical description of the classical Fourier transform?

So what I know is that a real-valued random variable is, categorically speaking, a morphism 1R1 \to \mathbb{R} in the Markov category Stoch (or BorelStoch), which is the Kleisli category of the Giry monad on measurable spaces (or standard Borel spaces). What I don't know is what your CFCF and FTFT might be, and how these would describe the Fourier transform.

Also, if iFTiFT is the inverse of FTFT, then the induced monad is the identity monad, which does not seem interesting. Perhaps you mean adjoint rather than inverse?

view this post on Zulip Hakimi Rashid (Jan 10 2021 at 09:25):

Tobias Fritz said:

So what I know is that a real-valued random variable is, categorically speaking, a morphism 1R1 \to \mathbb{R} in the Markov category Stoch (or BorelStoch), which is the Kleisli category of the Giry monad on measurable spaces (or standard Borel spaces). What I don't know is what your CFCF and FTFT might be, and how these would describe the Fourier transform.

So, here you mean I need to define a category of characteristic function in such a way that FTFT is a functor that will capture the notion of Fourier transform, right?

view this post on Zulip Hakimi Rashid (Jan 10 2021 at 09:32):

Tobias Fritz said:

Also, if iFTiFT is the inverse of FTFT, then the induced monad is the identity monad, which does not seem interesting. Perhaps you mean adjoint rather than inverse?

Why the identity monad does not seem interesting in this case? If adjoint, what could it mean concretely?

I thought the main idea of using fourier transform is to work in 'transformed' space that completely preserve the information of the original space. So you can do something within this space that would be hard to do in original space and can transform back the result to the original space. So, identity seems 'correct', right?

view this post on Zulip Tobias Fritz (Jan 10 2021 at 14:03):

Hakimi Rashid said:

So, here you mean I need to define a category of characteristic function in such a way that FTFT is a functor that will capture the notion of Fourier transform, right?

All I mean is that you need to explain the meaning and significance of your idea in order for anyone else to be able to comment on it. One way to achieve that is to explain which particular category CFCF and which functor FTFT you have in mind.

Why the identity monad does not seem interesting in this case? If adjoint, what could it mean concretely?

The identity monad is never interesting, just like the identity functor isn't an interesting functor or the one-element group isn't an interesting group. There isn't much that you can do with the identity monad on any category C: both it's Kleisli category and its Eilenberg-Moore category are just C again.

I thought the main idea of using fourier transform is to work in 'transformed' space that completely preserve the information of the original space. So you can do something within this space that would be hard to do in original space and can transform back the result to the original space.

That sounds like a good description. But this doesn't necessarily mean that it's appropriate to do the same thing at the categorical level. The categorical generalization of a standard concept often takes quite a different form than the oiriginal concept. For example, epimorphisms are a categorical generalization of surjective functions, but the definition of epimorphism and the definition of surjective function are quite different.

view this post on Zulip Hakimi Rashid (Jan 10 2021 at 14:18):

Maybe i should start with some context. I am working with time series data of random variables. There are many methods that have been developed so far to compute the statistical 'distance\ similarity' between 2 time series. These include KL divergence, mutual information and etc. This time series can also be represented by their frequency domain by applying Fourier transform on each time series. We can then also compute the 'distance/ similarity' between them within the frequency domain. Examples include coherence and many more.

For the time domain, I think Markov category is one of suitable category that I can work with. But for the frequency domain, I thought there should exist a 'dual' of Markov category or maybe other category that somehow connected to the original Markov category.

My aim is to distill / unify the process of :
1) 'embedding' time series to particular representation
2) use this representation structure to compute 'statistical distance'

Am I on the right track on using Markov category so far? Or should I consider other direction?

view this post on Zulip Hakimi Rashid (Jan 11 2021 at 06:15):

Tobias Fritz said:

All I mean is that you need to explain the meaning and significance of your idea in order for anyone else to be able to comment on it. One way to achieve that is to explain which particular category CFCF and which functor FTFT you have in mind.

Say, what if the objects in category CF are the Fourier transformed pdf of random variables, morphisms in CF are Fourier transformed of morphism between random variables in Markov category. would that be correct for what im aiming for?

view this post on Zulip Tobias Fritz (Jan 11 2021 at 06:18):

Yes, that clarifies things for me. If I understand correctly, the Fourier transform that you're taking is not the characteristic function, right? Because that would be E[eitX]\mathbb{E}[e^{itX}] for one fixed random variable XX, and this function is no longer random because the expectation value has been taken. While what you're doing is to take the Fourier transform in time separately for every realization of the process, which is then itself a random function. Right?

As far Markov categories, a number of people have already asked about developing stochastic process theory within that framework, but so far this doesn't exist yet. I can imagine that the equivalence that you describe has a categorical description in Markov category terms, but instead of it being an equivalence or isomorphism between two Markov categories, to me it looks more like an isomorphism of objects internal to a single Markov category.

In any case, for thinking about such a thing I guess it would help to already have some stochastic process theory in place for Markov categories — so you may want to think about this more generally first. How do you define a Markov category of stochastic processes? What are the morphisms? And how can one formulate and prove some of the basic theorems on stochastic processes?

view this post on Zulip Hakimi Rashid (Jan 11 2021 at 06:27):

Tobias Fritz said:

Yes, that clarifies things for me. If I understand correctly, the Fourier transform that you're taking is not the characteristic function, right? Because that would be E[eitX]\mathbb{E}[e^{itX}] for one fixed random variable XX, and this function is no longer random because the expectation value has been taken. While what you're doing is to take the Fourier transform in time separately for every realization of the process, which is then itself a random function. Right?

Ah, yes. I realize my mistake. I thought what I'm looking for is characteristic function since both involve taking Fourier transform. But what you describe is closer to what im aiming for. It should be random.

So, fourier tranform is deterministic but in combination with the pdf, the result is still random. FTψ:IXFT(X)FT\circ \psi : I \to X \to FT(X). Does this make sense? because the result have different sample space (frequency) than that of original sample space.

view this post on Zulip Hakimi Rashid (Jan 11 2021 at 06:32):

Tobias Fritz said:

As far Markov categories, a number of people have already asked about developing stochastic process theory within that framework, but so far this doesn't exist yet. I can imagine that the equivalence that you describe has a categorical description in Markov category terms, but instead of it being an equivalence or isomorphism between two Markov categories, to me it looks more like an isomorphism of objects internal to a single Markov category.

So, the objects ( Fourier transformed random variable (RV) of stochastic process) live in Markov category since they are still RV?

view this post on Zulip Tobias Fritz (Jan 11 2021 at 11:11):

Yep, that all sounds right to me :smile:

view this post on Zulip Hakimi Rashid (Jan 11 2021 at 11:59):

Thank you .

view this post on Zulip Hakimi Rashid (Jan 17 2021 at 15:32):

Hi, again. Regarding RE functor which was developed in the paper ( https://arxiv.org/pdf/1402.3067.pdf). You and @John Baez have considered the case where there is one system XX and one measurement YY with morphisms between them are f:XYf: X\to Y regarded as 'measuring process' and s:YXs:Y\to X as 'hypothesis'. Given a true probability q:XRq:X\to \R and a 'prior' p=srp = s\circ r, RE is defined such that it is the amount of information when we update our prior to true probability.

Whereas, in my case, I am considering 2 systems each with probabilities on their states (X,p)(X,p) and (Y,q)(Y,q). Both send signals and can be recorded as time series. They may communicate with each other thus influencing each other states with a certain probability. I want to define RE between them by adapting the definition provided by the paper.

First, define channels cx,y:XYc_{x,y}: X\to Y and cy,x:YXc_{y,x}:Y\to X to represent the communications between them. Then, RE is a functor that send objects to single object of category [0,][0,\infty], morphism cy,x:YXc_{y,x}:Y\to X to S(p,cy,xq)S(p,c_{y,x}\circ q) and morphism cx,y:XYc_{x,y}:X\to Y to S(q,cx,yp)S(q, c_{x,y}\circ p). Would this work?

view this post on Zulip Hakimi Rashid (Jan 17 2021 at 15:36):

Also, I want to define mutual information and correlation between the two systems following the similar line of thinking...

view this post on Zulip Javier Prieto (Jan 18 2021 at 15:47):

Have you checked that your definition does what you need in a simple setting, like FinStoch?

view this post on Zulip Hakimi Rashid (Jan 18 2021 at 16:13):

No. I haven't. Could you show me how or point the way? Im no mathematician. my background is in biology but I want to use category theory to describe part of the data analysis that are involved in my current research.

view this post on Zulip Javier Prieto (Jan 18 2021 at 18:57):

Are your channels stochastic matrices? If so, you can work in FinStoch (the category with finite sets as objects and stochastic matrices as morphisms) and try to prove your candidate functor SS respects identities and composition.

view this post on Zulip Hakimi Rashid (Jan 18 2021 at 22:32):

Ok. That seems workable for me. Thank you.

view this post on Zulip Hakimi Rashid (Jan 20 2021 at 04:46):

Does FinStochFinStoch enough, tho? because I want to use Fourier transform...

view this post on Zulip John Baez (Jan 20 2021 at 19:19):

I'm afraid nobody is answering your question because nobody knows what it means. "Is FinStochFinStoch enough?" is vague. It may be hard to turn this into a mathematically precise question, but if you do, then more people will answer it.

view this post on Zulip Hakimi Rashid (Jan 20 2021 at 22:26):

Oh. I see. I think I know somewhat the answer... In practice, we can only record finite amount of signal and can only analyze finite amount of data, so working with category FinStochFinStoch should be enough. Regarding Fourier transform, in practice we use fast Fourier transform algorithm which is discrete Fourier Transform and we apply the transform also on finite time series, so again FinStochFinStoch should also be the right category. Is this correct?
When working in FinStochFinStoch category, we limit ourselves to Finite set, right?

view this post on Zulip John Baez (Jan 20 2021 at 22:33):

Yes, that's what the "Fin" in "FinStoch" means.

view this post on Zulip Hakimi Rashid (Jan 20 2021 at 22:39):

Thank you. Maybe I should limit myself to finite category since that is what we do in practice.

view this post on Zulip John Baez (Jan 20 2021 at 22:40):

It makes a bunch of theorems easier to prove, which is why Tobias and I focused on that case. Measure theory is simple on finite sets.

view this post on Zulip Hakimi Rashid (Jan 20 2021 at 22:57):

To be honest... I was expecting of grabbing low-hanging fruits when I stumbled upon category theory. I was expecting that people have worked on all the parts that are related to what I do and I just can cite their theorem / proves and compile them together. When I found Evan's thesis, https://arxiv.org/pdf/2006.08945.pdf, I hit the jackpot. But, it seems, the works did not cover correlation and other statistical distance/similarity. Are you aware of any work that has been done on this or related to this?

view this post on Zulip John Baez (Jan 20 2021 at 23:01):

Nope, ask him.

There's a huge amount of work on category theory, but Evan - a young guy who just finished grad school - is one of the first to apply it to statistics. You've hit the jackpot if you want to help develop an interesting new branch of mathematics... but not if you just want to "cite theorems".

view this post on Zulip John Baez (Jan 20 2021 at 23:02):

Anyway, you should ask @Evan Patterson, not me, what work he's aware of in this area. He's the expert.

view this post on Zulip Eric Forgy (Jan 20 2021 at 23:06):

This is probably not very helpful, but I remember Tom Leinster had some interesting articles about population and diversity. It is not traditional statistics I guess, but I actually had some success applying that to finance replacing populations of penguins with financial securities. When markets all move together, e.g. during a crisis, the effective diversity of the population decreases. I used it to indentity extreme conditions for stress testing.

view this post on Zulip Hakimi Rashid (Jan 20 2021 at 23:08):

Did you use any distance measure/ statistical divergence in that work?

view this post on Zulip John Baez (Jan 20 2021 at 23:09):

I actually had some success applying that to finance replacing populations of penguins with financial securities.

Wow, you financed replacing populations of penguins with financial securities? What a typical Wall Street thing to do! :upside_down:

view this post on Zulip Spencer Breiner (Jan 20 2021 at 23:10):

You might also look at the paper here, which give a relation between the cumulants of a distribution (mean, variance, central moment, ...) and homotopy theory, though it is not an easy paper.
https://arxiv.org/abs/1302.3684

view this post on Zulip Eric Forgy (Jan 20 2021 at 23:10):

If you are interested, here are the old articles that inspired me:

view this post on Zulip Hakimi Rashid (Jan 20 2021 at 23:14):

Thank you @Spencer Breiner and @Eric Forgy . I'll take my time and go thru the papers.

view this post on Zulip Eric Forgy (Jan 20 2021 at 23:14):

John Baez said:

I actually had some success applying that to finance replacing populations of penguins with financial securities.

Wow, you financed replacing populations of penguins with financial securities? What a typical Wall Street thing to do! :upside_down:

Collateralized Penguin Obligations. AAA rated by Moodys :nerd:

view this post on Zulip John Baez (Jan 20 2021 at 23:19):

Penguins may go extinct, but Collateralized Penguin Obligations will remain. :penguin: :penguin: :penguin:

view this post on Zulip Eric Forgy (Jan 20 2021 at 23:19):

CPOs precipitated the last global financial crisis, but don't blame me :sweat_smile:

view this post on Zulip Fawzi Hreiki (Jan 20 2021 at 23:20):

FYI, Tom's work on this is now in book form and due to be published soon.

view this post on Zulip Eric Forgy (Jan 20 2021 at 23:21):

Fawzi Hreiki said:

FYI, Tom's work on this is now in book form and due to be published soon.

Cool. I always liked these ideas. Good to see them coming to fruition :+1:

view this post on Zulip Eric Forgy (Jan 20 2021 at 23:31):

Semi-seriously, diversification is important in portfolio management and Tom's ideas on diversity certainly apply to financial time series where each time series is like a different species. That is my second "first" in mathematical finance I guess (although never published). The first "first" was applying noncommutative geometry to finance :nerd:

view this post on Zulip Hakimi Rashid (Jan 29 2021 at 16:05):

Hi, all. Another question. How can we define algebraic operation on random variables in Markov category? for example, addition, subtraction, multiplication and etc.

view this post on Zulip Hakimi Rashid (Jan 31 2021 at 09:00):

would this work:
1)for multiplication, mult:XYXYmult:X\otimes Y\to XY,
2)addition, add:XYX+Yadd:X\otimes Y\to X+Y
3)subtraction, sub:XYXYsub:X\otimes Y\to X-Y
4)Division, div:XYX÷Ydiv:X\otimes Y\to X\div Y?

view this post on Zulip Hakimi Rashid (Feb 02 2021 at 04:57):

In https://arxiv.org/pdf/2006.08945.pdf, interacting supply was introduced so that we can have several structures such as vector spaces be defined within Markov category. In this case, if we supply a Markov category with vector spaces, can we then define inner product within Markov category?

view this post on Zulip John Baez (Feb 02 2021 at 05:29):

That's @Evan Patterson's thesis.

view this post on Zulip Evan Patterson (Feb 02 2021 at 05:48):

Not directly, since an abstract vector space doesn't come with an inner product. You can get partway there by introducing a symmetric bilinear map, since those properties are purely equational. At least within the framework of my thesis, you can't express the positive definiteness of an inner product.

view this post on Zulip Hakimi Rashid (Feb 02 2021 at 07:40):

Thank you for your reply Evan. Im gonna need some time to digest it. In the meantime, do you happen to know the answer to my question regarding the algebraic operation on random variables?

view this post on Zulip Hakimi Rashid (Feb 02 2021 at 08:08):

John Baez said:

You've hit the jackpot if you want to help develop an interesting new branch of mathematics... but not if you just want to "cite theorems".

Thank you John, that feels welcoming. But, the current me is really not equipped to develop new math. I don't have formal higher math education/training. Currently, I can only digest and hopefully understand proven theorems and link them together and, if applicable, use them in my work.

view this post on Zulip John Baez (Feb 02 2021 at 17:45):

I understand. Then you may be somewhat frustrated: perhaps not all the tools you want exist yet. You can work with the tools that exist, or try to to find a mathematician collaborator who can develop new tools that you need. To work with mathematicans it helps to state your needs as precisely as possible, in a lot of detail, starting by describing your background assumptions. I often don't understand what you're saying.

It sounds like Evan understood your last question, and his reply makes a lot of sense to me. He said the definition of inner product isn't purely equational: it involves an inequality too. From this I guess he has a framework for introducing purely equational concepts into the theory of random variables. This makes sense, because there's a lot of math developed for purely equational theories, like "Lawvere theories".

All this is just a bunch of guesses based on his reply: I haven't read the paper of his, that you're talking about. I mention my guesses just to show that a mathematician's view may be very different than yours: what's clear to you may be mysterious to the mathematician, and what's mysterious to you may be clear to the mathematician. So, you need to put a lot of energy into clear communication if you want to reach mathematicians.

view this post on Zulip Hakimi Rashid (Feb 05 2021 at 15:56):

Sorry to bother you again, @Evan Patterson. In your thesis, you mentioned loss function L(θ,d(X))L(\theta,d(X)) in the notes and references section of chapter 3. From the string diagram, I think we can define it as L:Ωd(X)RL: \Omega \otimes d(X) \to \R. Is this true?

Is the loss function deterministic?

Can I use it to compare between 2 variables?

view this post on Zulip Evan Patterson (Feb 06 2021 at 22:16):

The loss function has the form L:Ω×ARL: \Omega \times \mathcal{A} \to \mathbb{R}: for a given parameter θΩ\theta \in \Omega, what is the loss under action aAa \in \mathcal{A}? Decision rules have the form d:XAd: \mathcal{X} \to \mathcal{A}: given the data xXx \in \mathcal{X}, what action do I take?

AFAIK, people always take the loss to be deterministic, but you could make it randomized. Decision rules can also be randomized and sometimes actually are. For example, resampling procedures like cross-validation are often randomized. In the end, decision theorists study the risk (expected loss), which averages out the randomness in the samples, the decision rule, and the loss (if you were to allow randomized losses).

view this post on Zulip Hakimi Rashid (Feb 07 2021 at 04:54):

Thank you for the input @Evan Patterson . I was thinking of defining relative entropy similar to the form of the loss function. Since KL divergence is the expected log difference between 2 probabilities PP and QQ, can we define KL=EL:P×QRKL = E\circ L: P \times Q\to \R in Markov category?

where EE is the expectation and LL is the log difference between PP and QQ.