 
        
        You're reading the public-facing archive of the Category Theory Zulip server.
        
        To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
        
        For all things related to this archive refer to the same person.
        
I have been reading about markov category... from what I understand, it is a semicartesian monoidal category where objects have comonoid structure. So, they have morphisms like copy which is multiplication and delete which is the counit right?  
1) seems to me it also has unit morphism , which gives probability distribution on X, just curious what would it be if we equipped objects with monoid structures? what would  do?
2) regarding delete morphism , does this morphism discard the whole data or just probability distribution on X, so you will get back the sample?
3)Do probability comes with any dual structure akin to dual vector space?
Indeed, unit morphisms exist and have that interpretation, see page 11 in Fritz19. If you wanna have a monoid structure mirroring the comonoid, multiplication should have type for every object - I think you can't just pair any two objects because what would even mean?
I guess I was thinking along the line of defining statistical distance like KL divergence or maybe fisher information metric to compare between 2 distribution within markov category...
I assume you can define the KL divergence for any two parallel arrows but I don't think that's been done in this framework (yet)
Thank you for your reply. Would this work : we define as a generalized version of i.e, it maps X to scalars instead of delete just like counit in , . Then, KL divergence: ?
Do we need multiplication ? Does it define joint probability? Would dropping out of the above definition i.e do the same job?
A construction along those lines might work in because the monoidal unit is , but in for example the monoidal unit is the one-element set so you cannot "map to scalars" - the discard map is unique.
Joint probabilities on the pair are defined as arrows - at least in , but I believe this is true in any Markov category.
Hi both! I was just about to write a similar reply, so thanks for having mentioned that already @Javier Prieto (I'm unfortunately too busy these days to spend much time here...)
Quite generally, I think that the best methodology in applied category theory is to look at and understand the mathematical structures in the applied context, in probability in this case, and then abstracting to categorical structures from there. So in the case of KL divergence, I don't think that a description of the form will exist, since I don't know what the individual meaning of and in probability theory would be. At least one of them would have to be non-linear, since KL divergence is non-linear, which is at odds with the fact that morphisms in Markov categories (and variants thereof) are taken to be linear in the probabilities. Perhaps one can entertain categories with morphisms that act non-linearly, but then one needs to answer the question of what their probabilistic meaning and significance is.
One possible use for monoid structures would be to axiomatize categories of monoid-valued random variables. For example with real-valued random variables, whose distributions are morphisms , we may be interested in adding two of them. This amounts to composing their joint distribution with . I don't think that anyone has thought about such categories yet, and prior do doing so one should have an idea of what one is trying to achieve with it. (In many probability theory statements, like the laws of large numbers, one is interested in averaging a bunch of random variables rather than adding them. Although averaging and adding of real numbers are operations which differ only by a factor of , I've come to regard their categorical generalizations as quite different kinds of beasts: addition is captured by monoid structures, while averaging has more to do with Representable Markov categories.
Hopefully some of this gives some additional insight beyond what Javier already said.
Thank you @Tobias Fritz and @Javier Prieto for your reply. Does this means that we cannot define any statistical distance at all in Markov category? Or, is there any statistical distance that is 'natural' and definable in Markov category?
If, let say we want to have categorical framework for statistical distance, do we need to start from scratch defining new category or we can add structures to Markov category by somehow enriching it in order to capture notions of statistical distance?
This is not an answer but it's pointing at one: you may find this stream and the links therein interesting. In particular, there is this paper
John C. Baez, Tobias Fritz, A Bayesian Characterization of Relative Entropy
in which relative entropy is defined as a functor from to . The definition of is a bit involved, but I think it embeds into because the morphisms are essentially certain diagrams involving stochastic maps. I don't know if this functor can be extended/defined on .
I don't have a definite answer on the last question either, but very roughly: distances are real numbers, and finding any general categorical construction which outputs real numbers is tricky. It would have to involve some kind of limit. Assuming a suitable kind of enrichment would indeed be the easier and perhaps more sensible thing to do, and I would hope that many of the existing results for Markov categories can be generalized to the enriched case, and thereby apply in situations where one wants to talk about e.g. approximate equality of distributions.
On the other hand, although getting intrinsic categorical notions of distance is difficult, @Eigil Rischel has recently proposed an intrinsic topoloogy on the hom-sets of every Markov category (satisfying some mild conditions). That this is possible has been very surprising to me. It's current work in progress, so I'm afraid that I can't say more at this point, simply because we don't really know much more.
Thank you @Javier Prieto for sharing those resources and @Tobias Fritz for sharing the current progress on this issue. I really hope and pray that these will be sorted out and get published soon because most part of my current work can be captured by Markov category I think.
I am reading this paper https://arxiv.org/pdf/1709.00322.pdf 'Disintegration and Bayesian Inversion via String Diagrams' by cho and jacobs. In the paper, under the 7th section - Beyond causal channels, they mentioned about 'enlarging' the CD category which enables the notions of scalars and effects . What do these concepts mean intuitively in a probability setting? Can we use them somehow to define KL divergence?
That just refers to including probability distributions and channels which are not normalized. When they say "causal", they're referring to a categorical formulation and generalization of the normalization of probability.
After reading your paper with @John Baez on relative entropy functor, I ve come to realize that any distance function is a function that kind of live in between 2 categories, if I understand it correctly. The input of the function live in one category for example markov category in the case of probability distribution and the output live in the realm of monoid of real number. Exception would be vector space because vectors and the output of inner product can live in the same category. Is this right?
That's one way to do it, and it's one which seems to capture entropy and KL-divergence particularly accurately. But it's also possible to work with categories enriched in metric spaces, so that one has a given measure of distance between any two morphisms (between the same two objects). For example the category of channels has a canonical enrichment like this if one measures the distance between two channels by (the channel generalization of) the total variation distance. Also Wasserstein distance can be treated like this. Of course, the difference is that these distances are then additional structures on the category and not canonically determined from the categorical structure alone.
can we then just abstract this notion of distance and similarity and capture it in single categorical construct that can cover all distance measure?
would structured cospan do the job?
Categories enriched in metric spaces are indeed a single categorical concept which covers a lot of distance measures. (KL divergence is one of the exceptions, due to the failure of the triangle inequality.) But this observation in itself doesn't really do much, since it's just a definition. It's about the same as saying that the definition of metric space covers all distance measures. That's true, but has little use in itself; it's more like a starting point for the development of a theory.
I don't see any relation to structured cospans, but perhaps someone who has worked with those can say more.
Not me.
Owh... I was thinking about a construction akin to functor category in which the slots are not fixed but can be inserted with various compatible categories. So, we can have cospan like so: which results in functor category . Is there such a thing? Or is it total nonsense?
It doesn't parse - that is, it doesn't type-check. isn't a category if it doesn't have anything plugged into it, so what kind of morphisms should and be?
ok. it is total nonsense then. Is there any categorical construction that can achieve a similar goal? or the goal itself is not achievable?
What if i define a 2-category with the following diagram then define a 2-functor to other 2-category. will it achieved the goal of having variable categories?
In general, we can think of the construction as a -functor .
Hakimi Rashid said:
What if i define a 2-category with the following diagram then define a 2-functor to other 2-category. will it achieved the goal of having variable categories?
You likely can't do this with cospans in a nice way, since if is a non-empty category and is the empty category, then is also empty, so no functor even exists..!
If we restrict ourselves to compatible categories, would the construction work?
What do you expect your functors and to do? More generally, do you have an aim that's not accomplished by understanding itself as a 2-functor, without the cospan attached?
(That's not intended to be confrontational, I just want more of an insight into where you're going with this :grinning_face_with_smiling_eyes: )
Hakimi Rashid said:
After reading your paper with John Baez on relative entropy functor, I ve come to realize that any distance function is a function that kind of live in between 2 categories, if I understand it correctly. The input of the function live in one category for example markov category in the case of probability distribution and the output live in the realm of monoid of real number. Exception would be vector space because vectors and the output of inner product can live in the same category. Is this right?
The reason is I want to have a construction that generalize distance and divergence along this line of thinking.
The functor is the input and is the output.
In a typical distance situation you have a "pairing"  which takes a pair of "things"  and  and outputs a "value" . When  and  are members of spaces or categories, we might impose the restriction that the pairing is natural in its variables, which is to say that varying  and  gives a comparable variation in the value of .
However, I don't know what situation you're imagining where there are mappings/transformations ; that's what I want to understand. You don't need these mappings in order for the pairing to make sense; you can just start with the pair  consisting of the input and output, without requiring that those maps/morphisms exist.
https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf
This is because I wanted to follow the approach taken by these papers above. They defined KL divergence as a functor between 2 categories. I want to generalize this to other distance measures as well which might take different categories as input and output. I think what I should use is a category of functor categories right?
The pairing is a construction akin to inner product right? where both and are the inputs and is the output. If I understand their paper correctly, should be the input category and is the output category and is the fuctor category where KL divergence live.
So, if this correct, then we can describe inner product as the following: or . right?
No, inner products act on elements within vector spaces. So in that case, for a specific vector space , an inner product is a bilinear map . It's not an operation on the ordinary category of vector spaces.
but if we think of as 1-dimensional vector space, can we somehow define the bilinear map using endofunctor? So, I was thinking of it like paraphrasing the definition in term of a more higher abstraction.
Hakimi Rashid said:
but if we think of as 1-dimensional vector space, can we somehow define the bilinear map using endofunctor? So, I was thinking of it like paraphrasing the definition in term of a more higher abstraction.
But there can be many inner products on a given vector space (or, more generally, pairings between vector spaces), and linear maps do not necessarily respect them or canonically extend them, so the construction of inner products is not functorial.
I don't understand. Would you kindly elaborate more on inner products being not functorial? And also the issues of having many inner products.
Why can't the construction subsume inner product?
On making inner products functorial: the first problem is that there is no canonical way to equip a vector space with an inner product. One way to resolve this is to work instead with vector spaces equipped with a basis, since in that case there is a canonical inner product making that basis orthonormal. Next, one has the task of deciding which morphisms to choose: a morphism should consist of a linear map, but how should that interact with the basis, bearing in mind that we want the functor to send morphisms to transformations between inner product spaces? Well, one way that we can express that a linear map "respects the inner product" is that it should end up satisfying the equation , where are the inner products on the respective spaces. In other words, we want the linear maps whose matrix with respect to the bases and has orthonormal columns in . The result is a category of vector spaces with bases on which we can functorially assign inner products, but it no longer looks very much like , since for example there are no morphisms from higher dimensional spaces to lower dimensional spaces.
Note in particular that if you had wanted to view as a category enriched over itself, this restricted category is no-longer -enriched.
As for "the construction ", we can make into an ordinary category (by equipping it with morphisms representing the ordering) but this structure is not particularly compatible with its vector space structure, and in particular any functors from into this category are constant, so reading that as a functor category won't give you anything resembling an inner product. We could instead interpret as a one-object category enriched over the category of real vector spaces, but then consists of functors which necessarily map each vector space to that one object, and provides a linear map from each vector space of linear maps to . That's something similar to an inner product, but still not quite what you're looking for, I think.
wow, thank you for your explanation. I need some time to unpack the whole thing. Looks like there's a lot of homework I need to do just to understand your answer.
When understanding CT, it helps to unpack all of the definitions, if nothing else to make sure that something you've written makes sense! Good luck :grinning:
[Mod] Morgan Rogers said:
On making inner products functorial: the first problem is that there is no canonical way to equip a vector space with an inner product. One way to resolve this is to work instead with vector spaces equipped with a basis, since in that case there is a canonical inner product making that basis orthonormal. Next, one has the task of deciding which morphisms to choose: a morphism should consist of a linear map, but how should that interact with the basis, bearing in mind that we want the functor to send morphisms to transformations between inner product spaces? Well, one way that we can express that a linear map "respects the inner product" is that it should end up satisfying the equation , where are the inner products on the respective spaces. In other words, we want the linear maps whose matrix with respect to the bases and has orthonormal columns in . The result is a category of vector spaces with bases on which we can functorially assign inner products, but it no longer looks very much like , since for example there are no morphisms from higher dimensional spaces to lower dimensional spaces.
If I understand this part, that captures inner product will only be true for very limited subset of , right?
[Mod] Morgan Rogers said:
We could instead interpret as a one-object category enriched over the category of real vector spaces, but then consists of functors which necessarily map each vector space to that one object, and provides a linear map from each vector space of linear maps to . That's something similar to an inner product, but still not quite what you're looking for, I think.
I think this is quite similar to what they define for relative entropy functor in those papers, right?
Hakimi Rashid said:
I think this is quite similar to what they define for relative entropy functor in those papers, right?
No, their relative entropy is a specific functor; is a whole category of functors.
Hakimi Rashid said:
If I understand this part, that captures inner product will only be true for very limited subset of , right?
It seems like this notation does not mean what you think it means. means either the category of ordinary endofunctors of , or the category of -enriched endofunctors of . I don't think either of these things in any way "captures inner product".
Yeah, [Vect, Vect] has nothing much to do with "inner product".
Oh. I get it. If I try to generalize by simply going higher in the hierarchy of abstraction, then it will carry with it whole other baggage that is unnecessary and unrelated to the concepts I am trying to generalize.
I thought I can generalize both and other statistical distances by following the pattern laid out in the papers. Has it ever been done before? Or it is something that is not possible?
If we restrict ourselves to only statistical distances and exclude vector space and inner product then we can still capture many other statistical distances / similarity measures such as correlations, KL divergences, wasserstein distance and etc using functor category right?
So, the construction still applies. We just need to define compatible categories X and Y. Is this correct?
Hakimi Rashid said:
I thought I can generalize both and other statistical distances by following the pattern laid out in the papers.
But the thing you keep describing doesn't look anything like what appears in the papers you mentioned!!
In the papers, we have...
There is just one rather special functor in each situation. The notation is classical notation for "the positive real numbers, with infinity", rather than a functor category, which might have confused you?
Oh no. I am really confused now. I need to dissect the whole thing again to pinpoint where I might go wrong. KL divergence in their paper is a functor right?
A functor between the input category (in the paper Finstat and Sbstat) and the output category ('measure category').
Since it is a specific functor it lives in functor category [input, output].
hence if i want to define other measures(eg, correlation, wasserstein and etc) categorically, I can follow their 'formula' by defining a functor for each measure between specific category to compatible measure category.
so far... anything wrong?
every specific measure is a specific functor between specific categories. they live in different functor category.
I thought if i can have a construction where I can vary the input category and the output category and thus functor category, I can generalised the definition, in which, relative entropy functor is one specific example.
Yes, that's all correct, but
Rather than considering the whole functor category (which probably contains a lot more stuff than you need, yet on the other hand which it's hard to find anything much in!) a good approach would be to identify what features make their construction work. What makes a good "input category" or "output category"? What features make this approach work? In particular, which features are common with the other examples that you want to consider?
By that, you mean it is possible to have a single categorical construct that can be used to define them all? or there is no such thing and thus I need to define them one by one?
There could well be a single construction that works for many of them; it's all about finding the right level of generality :grinning_face_with_smiling_eyes:
Ok. understood.
I appreciate that wasn't a very specific pointer to what you should do! Concretely, I'd say you have enough work ahead if you just focus on defining categories on which the measures make sense, categories which are suitable for measuring, and concrete examples of functors between them, so I'm just discouraging you from dealing with whole functor categories before you're ready. It might be that the properties featuring in the article (semicontinuous, convex linear, etc) have interesting translations into properties of the corresponding objects in , and I personally would be very interested to discover if that is the case, but finding that out might not be the most direct route to the generalisation you're seeking.
Thank you for your tips and guidance. So, this has never been done before? Or if you know someone who has / is currently working on it? I actually was hoping that this has been done before so that I can use the result as part of my work.
Hi again. I was wondering... Can we define characteristic function of a random variable within Markov category?
Briefly, I would say that we can't define characteristic functions in Markov categories just yet. Markov categories are much more general than plain old probability theory. Characteristic functions on the other hand are a concept rather specific to real-valued random variables in the traditional sense (although a very powerful concept for sure). That's why I think that there won't be something like a characteristic function in general Markov cats.
But it's entirely conceivable that there is something like a characteristic function for states in certain Markov categories on objects which "look like" the real numbers. I'm pretty sure though that nobody knows how to do this so far, so it's one of the many open questions.
It's also possible that there will be another concept for suitable Markov categories which is different from the characteristic function, but can replace the latter for some of its purposes, such as in the proof of the central limit theorem. But all of this is speculation at the moment, so just take it as a description of the scope of conceivable possibilities.
Ok. Thank you for your feedback @Tobias Fritz . Another question if I may... from what I have seen so far, deterministic maps are defined as maps that respect copying but in terms of string diagrams they are drawn the same as probabilistic ones, so, how can we differentiate between the two? is it just by looking at the context in which they occur ?
It's easy to write down a string diagram that expresses the fact that a map respects copying. I guess if you want you can draw such maps in a different color or something.
Oh. I was thinking about doing something like that... but probably my question should be, why so far people haven't done that? Is it because you can just look at the context, so there is no need for differentiating them diagrammatically?
my question should be, why so far people haven't done that?
I don't know why or even if they haven't, but go ahead and do it - if you're the first, maybe people will use your notation.
John Baez said:
I don't know why or even *if* they haven't, but go ahead and do it - if you're the first, maybe people will use your notation.
I haven't seen it... but my knowledge is very limited. Maybe @Tobias Fritz knows better.
There was some brief discussion on this topic here. (I'm in favour of graphically distinguishing stochastic morphisms from deterministic ones, as I think it makes things easier to follow, but Tobias gave some reasonable points against it.)
Thank you @Nathaniel Virgo for the input.
I 'm trying to adapt the relative entropy construction following https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf using Markov category. In the papers, the morphism between objects in is a pair and RE functor send them to . Why do we need the morphism? can we do it with just leaving out ?
Quite generally, John's point is probably the best: if you invent a piece of notation and like it, then go ahead and use it! If it's useful and makes things more intelligible to others, then others will start using it too.
During the writing on our latest paper, we had indeed considered using a separate notation for deterministic morphisms. As in the discussion that @Nathaniel Virgo has linked to, there are advantages and disadvantages, and in the end we decided against doing it. But I'd be curious to see a paper which does it, in order to see how it pans out in practice.
Hakimi Rashid said:
I 'm trying to adapt the relative entropy construction following https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf using Markov category. In the papers, the morphism between objects in is a pair and RE functor send them to . Why do we need the morphism? can we do it with just leaving out ?
Interesting question! Perhaps there's a way to do without it, which would be very interesting to see. But in , the role played by the map is that it is measure-preserving between and , meaning that , which you can regard as the definition of . Then you may just as well write . So when putting it like this, it's actually the which is not needed!
I see. By the way, I'm also considering the kind of solution that would be brought about by following enriched Markov category similar to this thesis https://www.erischel.com/documents/mscthesis.pdf. Just that, I don't know which would be easier and generalize more to other statistical distances. Maybe to define other distances, we need to consider different categories on which to enrich Markov category?
Hakimi Rashid said:
I 'm trying to adapt the relative entropy construction following https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf using Markov category. In the papers, the morphism between objects in is a pair and RE functor send them to . Why do we need the morphism? can we do it with just leaving out ?
There are lots of different categories, good for different things. In the category that Tobias and I called a morphism describes 1) how a state of the system being observed deterministically produces an observation, and 2) a recipe for guessing the state of the system from an observation. Part 1) is the measure-preserving function from to , and part 2) is the stochastic map from back to .
I think if you read the introduction to our paper you'll see that both of these are used to compute relative entropy. We give the formula, and it involves both and . We also explain what's going on.
Thank you for the input @John Baez . I've read the introduction part of that paper and if I understand it correctly, you have framed the definition on the ground of the scenario or example of how RE might be used. However, in the case of the setting of application I'm considering, I don't think I have the morphism. I want to compare 2 time series from sensor recordings and that have values in .
So, in this setting, I want to know how different / similar they are to each other.
Tobias Fritz said:
Briefly, I would say that we can't define characteristic functions in Markov categories just yet. Markov categories are much more general than plain old probability theory. Characteristic functions on the other hand are a concept rather specific to real-valued random variables in the traditional sense (although a very powerful concept for sure). That's why I think that there won't be something like a characteristic function in general Markov cats.
But it's entirely conceivable that there is something like a characteristic function for states in certain Markov categories on objects which "look like" the real numbers. I'm pretty sure though that nobody knows how to do this so far, so it's one of the many open questions.
It's also possible that there will be another concept for suitable Markov categories which is different from the characteristic function, but can replace the latter for some of its purposes, such as in the proof of the central limit theorem. But all of this is speculation at the moment, so just take it as a description of the scope of conceivable possibilities.
I'm guessing the reason is because if we apply fourier transform to a random variables then we land in another category outside of Markov category. Is this true?
Hakimi Rashid said:
I'm guessing the reason is because if we apply fourier transform to a random variables then we land in another category outside of Markov category. Is this true?
Not exactly. The reason is because there is no such thing as "apply the Fourier transform" in the first place. How would that be defined?
I was thinking of defining it as functor. Maybe I don't quite grasp the whole concept just yet.
Hi again. I'm still in the dark of why it does not make sense to define fourier transform of random variable as functor from Markov category to 'another' category CF category: and its inverse as . So, we can have monad as composition of the two . Would you kindly walk me through it? Thank you.
When you propose a new mathematical idea, then you need to explain why you think that it does make sense, and this seems to be missing here. In other words, what does your proposal have to do with characteristic functions or the Fourier transform at all? Which categories and which functors do you need to pick in order to obtain a categorical description of the classical Fourier transform?
So what I know is that a real-valued random variable is, categorically speaking, a morphism in the Markov category Stoch (or BorelStoch), which is the Kleisli category of the Giry monad on measurable spaces (or standard Borel spaces). What I don't know is what your and might be, and how these would describe the Fourier transform.
Also, if is the inverse of , then the induced monad is the identity monad, which does not seem interesting. Perhaps you mean adjoint rather than inverse?
Tobias Fritz said:
So what I know is that a real-valued random variable is, categorically speaking, a morphism in the Markov category Stoch (or BorelStoch), which is the Kleisli category of the Giry monad on measurable spaces (or standard Borel spaces). What I don't know is what your and might be, and how these would describe the Fourier transform.
So, here you mean I need to define a category of characteristic function in such a way that is a functor that will capture the notion of Fourier transform, right?
Tobias Fritz said:
Also, if is the inverse of , then the induced monad is the identity monad, which does not seem interesting. Perhaps you mean adjoint rather than inverse?
Why the identity monad does not seem interesting in this case? If adjoint, what could it mean concretely?
I thought the main idea of using fourier transform is to work in 'transformed' space that completely preserve the information of the original space. So you can do something within this space that would be hard to do in original space and can transform back the result to the original space. So, identity seems 'correct', right?
Hakimi Rashid said:
So, here you mean I need to define a category of characteristic function in such a way that is a functor that will capture the notion of Fourier transform, right?
All I mean is that you need to explain the meaning and significance of your idea in order for anyone else to be able to comment on it. One way to achieve that is to explain which particular category and which functor you have in mind.
Why the identity monad does not seem interesting in this case? If adjoint, what could it mean concretely?
The identity monad is never interesting, just like the identity functor isn't an interesting functor or the one-element group isn't an interesting group. There isn't much that you can do with the identity monad on any category C: both it's Kleisli category and its Eilenberg-Moore category are just C again.
I thought the main idea of using fourier transform is to work in 'transformed' space that completely preserve the information of the original space. So you can do something within this space that would be hard to do in original space and can transform back the result to the original space.
That sounds like a good description. But this doesn't necessarily mean that it's appropriate to do the same thing at the categorical level. The categorical generalization of a standard concept often takes quite a different form than the oiriginal concept. For example, epimorphisms are a categorical generalization of surjective functions, but the definition of epimorphism and the definition of surjective function are quite different.
Maybe i should start with some context. I am working with time series data of random variables. There are many methods that have been developed so far to compute the statistical 'distance\ similarity' between 2 time series. These include KL divergence, mutual information and etc. This time series can also be represented by their frequency domain by applying Fourier transform on each time series. We can then also compute the 'distance/ similarity' between them within the frequency domain. Examples include coherence and many more.
For the time domain, I think Markov category is one of suitable category that I can work with. But for the frequency domain, I thought there should exist a 'dual' of Markov category or maybe other category that somehow connected to the original Markov category.
My aim is to distill / unify the process of : 
1) 'embedding' time series to particular representation
2) use this representation structure to compute 'statistical distance'
Am I on the right track on using Markov category so far? Or should I consider other direction?
Tobias Fritz said:
All I mean is that you need to explain the meaning and significance of your idea in order for anyone else to be able to comment on it. One way to achieve that is to explain which particular category and which functor you have in mind.
Say, what if the objects in category CF are the Fourier transformed pdf of random variables, morphisms in CF are Fourier transformed of morphism between random variables in Markov category. would that be correct for what im aiming for?
Yes, that clarifies things for me. If I understand correctly, the Fourier transform that you're taking is not the characteristic function, right? Because that would be for one fixed random variable , and this function is no longer random because the expectation value has been taken. While what you're doing is to take the Fourier transform in time separately for every realization of the process, which is then itself a random function. Right?
As far Markov categories, a number of people have already asked about developing stochastic process theory within that framework, but so far this doesn't exist yet. I can imagine that the equivalence that you describe has a categorical description in Markov category terms, but instead of it being an equivalence or isomorphism between two Markov categories, to me it looks more like an isomorphism of objects internal to a single Markov category.
In any case, for thinking about such a thing I guess it would help to already have some stochastic process theory in place for Markov categories — so you may want to think about this more generally first. How do you define a Markov category of stochastic processes? What are the morphisms? And how can one formulate and prove some of the basic theorems on stochastic processes?
Tobias Fritz said:
Yes, that clarifies things for me. If I understand correctly, the Fourier transform that you're taking is not the characteristic function, right? Because that would be for one fixed random variable , and this function is no longer random because the expectation value has been taken. While what you're doing is to take the Fourier transform in time separately for every realization of the process, which is then itself a random function. Right?
Ah, yes. I realize my mistake. I thought what I'm looking for is characteristic function since both involve taking Fourier transform. But what you describe is closer to what im aiming for. It should be random.
So, fourier tranform is deterministic but in combination with the pdf, the result is still random. . Does this make sense? because the result have different sample space (frequency) than that of original sample space.
Tobias Fritz said:
As far Markov categories, a number of people have already asked about developing stochastic process theory within that framework, but so far this doesn't exist yet. I can imagine that the equivalence that you describe has a categorical description in Markov category terms, but instead of it being an equivalence or isomorphism between two Markov categories, to me it looks more like an isomorphism of objects internal to a single Markov category.
So, the objects ( Fourier transformed random variable (RV) of stochastic process) live in Markov category since they are still RV?
Yep, that all sounds right to me :smile:
Thank you .
Hi, again. Regarding RE functor which was developed in the paper ( https://arxiv.org/pdf/1402.3067.pdf). You and @John Baez have considered the case where there is one system and one measurement with morphisms between them are regarded as 'measuring process' and as 'hypothesis'. Given a true probability and a 'prior' , RE is defined such that it is the amount of information when we update our prior to true probability.
Whereas, in my case, I am considering 2 systems each with probabilities on their states and . Both send signals and can be recorded as time series. They may communicate with each other thus influencing each other states with a certain probability. I want to define RE between them by adapting the definition provided by the paper.
First, define channels and to represent the communications between them. Then, RE is a functor that send objects to single object of category , morphism to and morphism to . Would this work?
Also, I want to define mutual information and correlation between the two systems following the similar line of thinking...
Have you checked that your definition does what you need in a simple setting, like FinStoch?
No. I haven't. Could you show me how or point the way? Im no mathematician. my background is in biology but I want to use category theory to describe part of the data analysis that are involved in my current research.
Are your channels stochastic matrices? If so, you can work in FinStoch (the category with finite sets as objects and stochastic matrices as morphisms) and try to prove your candidate functor respects identities and composition.
Ok. That seems workable for me. Thank you.
Does enough, tho? because I want to use Fourier transform...
I'm afraid nobody is answering your question because nobody knows what it means. "Is enough?" is vague. It may be hard to turn this into a mathematically precise question, but if you do, then more people will answer it.
Oh. I see.  I think I know somewhat the answer... In practice, we can only record finite amount of signal and can only analyze finite amount of data, so working with category  should be enough. Regarding Fourier transform, in practice we use fast Fourier transform algorithm which is discrete Fourier Transform and we apply the transform also on finite time series, so again  should also be the right category. Is this correct? 
When working in  category, we limit ourselves to Finite set, right?
Yes, that's what the "Fin" in "FinStoch" means.
Thank you. Maybe I should limit myself to finite category since that is what we do in practice.
It makes a bunch of theorems easier to prove, which is why Tobias and I focused on that case. Measure theory is simple on finite sets.
To be honest... I was expecting of grabbing low-hanging fruits when I stumbled upon category theory. I was expecting that people have worked on all the parts that are related to what I do and I just can cite their theorem / proves and compile them together. When I found Evan's thesis, https://arxiv.org/pdf/2006.08945.pdf, I hit the jackpot. But, it seems, the works did not cover correlation and other statistical distance/similarity. Are you aware of any work that has been done on this or related to this?
Nope, ask him.
There's a huge amount of work on category theory, but Evan - a young guy who just finished grad school - is one of the first to apply it to statistics. You've hit the jackpot if you want to help develop an interesting new branch of mathematics... but not if you just want to "cite theorems".
Anyway, you should ask @Evan Patterson, not me, what work he's aware of in this area. He's the expert.
This is probably not very helpful, but I remember Tom Leinster had some interesting articles about population and diversity. It is not traditional statistics I guess, but I actually had some success applying that to finance replacing populations of penguins with financial securities. When markets all move together, e.g. during a crisis, the effective diversity of the population decreases. I used it to indentity extreme conditions for stress testing.
Did you use any distance measure/ statistical divergence in that work?
I actually had some success applying that to finance replacing populations of penguins with financial securities.
Wow, you financed replacing populations of penguins with financial securities? What a typical Wall Street thing to do! :upside_down:
You might also look at the paper here, which give a relation between the cumulants of a distribution (mean, variance, central moment, ...) and homotopy theory, though it is not an easy paper.
https://arxiv.org/abs/1302.3684
If you are interested, here are the old articles that inspired me:
Thank you @Spencer Breiner and @Eric Forgy . I'll take my time and go thru the papers.
John Baez said:
I actually had some success applying that to finance replacing populations of penguins with financial securities.
Wow, you financed replacing populations of penguins with financial securities? What a typical Wall Street thing to do! :upside_down:
Collateralized Penguin Obligations. AAA rated by Moodys :nerd:
Penguins may go extinct, but Collateralized Penguin Obligations will remain. :penguin: :penguin: :penguin:
CPOs precipitated the last global financial crisis, but don't blame me :sweat_smile:
FYI, Tom's work on this is now in book form and due to be published soon.
Fawzi Hreiki said:
FYI, Tom's work on this is now in book form and due to be published soon.
Cool. I always liked these ideas. Good to see them coming to fruition :+1:
Semi-seriously, diversification is important in portfolio management and Tom's ideas on diversity certainly apply to financial time series where each time series is like a different species. That is my second "first" in mathematical finance I guess (although never published). The first "first" was applying noncommutative geometry to finance :nerd:
Hi, all. Another question. How can we define algebraic operation on random variables in Markov category? for example, addition, subtraction, multiplication and etc.
would this work: 
1)for multiplication, ,
2)addition, 
3)subtraction, 
4)Division, ?
In https://arxiv.org/pdf/2006.08945.pdf, interacting supply was introduced so that we can have several structures such as vector spaces be defined within Markov category. In this case, if we supply a Markov category with vector spaces, can we then define inner product within Markov category?
That's @Evan Patterson's thesis.
Not directly, since an abstract vector space doesn't come with an inner product. You can get partway there by introducing a symmetric bilinear map, since those properties are purely equational. At least within the framework of my thesis, you can't express the positive definiteness of an inner product.
Thank you for your reply Evan. Im gonna need some time to digest it. In the meantime, do you happen to know the answer to my question regarding the algebraic operation on random variables?
John Baez said:
You've hit the jackpot if you want to help develop an interesting new branch of mathematics... but not if you just want to "cite theorems".
Thank you John, that feels welcoming. But, the current me is really not equipped to develop new math. I don't have formal higher math education/training. Currently, I can only digest and hopefully understand proven theorems and link them together and, if applicable, use them in my work.
I understand. Then you may be somewhat frustrated: perhaps not all the tools you want exist yet. You can work with the tools that exist, or try to to find a mathematician collaborator who can develop new tools that you need. To work with mathematicans it helps to state your needs as precisely as possible, in a lot of detail, starting by describing your background assumptions. I often don't understand what you're saying.
It sounds like Evan understood your last question, and his reply makes a lot of sense to me. He said the definition of inner product isn't purely equational: it involves an inequality too. From this I guess he has a framework for introducing purely equational concepts into the theory of random variables. This makes sense, because there's a lot of math developed for purely equational theories, like "Lawvere theories".
All this is just a bunch of guesses based on his reply: I haven't read the paper of his, that you're talking about. I mention my guesses just to show that a mathematician's view may be very different than yours: what's clear to you may be mysterious to the mathematician, and what's mysterious to you may be clear to the mathematician. So, you need to put a lot of energy into clear communication if you want to reach mathematicians.
Sorry to bother you again, @Evan Patterson. In your thesis, you mentioned loss function in the notes and references section of chapter 3. From the string diagram, I think we can define it as . Is this true?
Is the loss function deterministic?
Can I use it to compare between 2 variables?
The loss function has the form : for a given parameter , what is the loss under action ? Decision rules have the form : given the data , what action do I take?
AFAIK, people always take the loss to be deterministic, but you could make it randomized. Decision rules can also be randomized and sometimes actually are. For example, resampling procedures like cross-validation are often randomized. In the end, decision theorists study the risk (expected loss), which averages out the randomness in the samples, the decision rule, and the loss (if you were to allow randomized losses).
Thank you for the input @Evan Patterson . I was thinking of defining relative entropy similar to the form of the loss function. Since KL divergence is the expected log difference between 2 probabilities and , can we define in Markov category?
where is the expectation and is the log difference between and .