Probability dual · learning: questions

I have been reading about markov category... from what I understand, it is a semicartesian monoidal category where objects have comonoid structure. So, they have morphisms like copy which is multiplication and delete which is the counit right?
1) seems to me it also has unit morphism

\eta_X:I\to X

, which gives probability distribution on X, just curious what would it be if we equipped objects with monoid structures? what would

\mu: X\otimes Y\to XY

do?
2) regarding delete morphism

Del_X:X\to I

, does this morphism discard the whole data or just probability distribution on X, so you will get back the sample?
3)Do probability comes with any dual structure akin to dual vector space?

Javier Prieto (Dec 09 2020 at 10:42):

Indeed, unit morphisms exist and have that interpretation, see page 11 in Fritz19. If you wanna have a monoid structure mirroring the comonoid, multiplication should have type

\mu : X \otimes X \to X

for every object

X

- I think you can't just pair any two objects because what would

XY

even mean?

Hakimi Rashid (Dec 09 2020 at 14:36):

I guess I was thinking along the line of defining statistical distance like KL divergence or maybe fisher information metric to compare between 2 distribution within markov category...

Javier Prieto (Dec 10 2020 at 12:29):

I assume you can define the KL divergence for any two parallel arrows

p, q : I \to X

but I don't think that's been done in this framework (yet)

Hakimi Rashid (Dec 11 2020 at 03:06):

Thank you for your reply. Would this work : we define

\epsilon_X:X\to I

as a generalized version of

Del_X

i.e, it maps X to scalars instead of delete just like counit in

Hilb

\epsilon_H:\mathbb{H}\to \mathbb{C}

. Then, KL divergence:

\epsilon\circ \mu \circ (p \otimes q): I\to I

Do we need multiplication

\mu: X\otimes X\to X

? Does it define joint probability? Would dropping

\mu

out of the above definition i.e

\epsilon \circ (p \otimes q): I\to I

do the same job?

Javier Prieto (Dec 11 2020 at 08:53):

A construction along those lines might work in

\mathrm{Hilb}

because the monoidal unit is

\mathbb{C}

, but in

\mathrm{FinStoch}

for example the monoidal unit is the one-element set so you cannot "map to scalars" - the discard map is unique.

Joint probabilities on the pair

(X,Y)

are defined as arrows

p : I \to X \otimes Y

- at least in

\mathrm{FinStoch}

, but I believe this is true in any Markov category.

Tobias Fritz (Dec 11 2020 at 09:19):

Hi both! I was just about to write a similar reply, so thanks for having mentioned that already @Javier Prieto (I'm unfortunately too busy these days to spend much time here...)

Quite generally, I think that the best methodology in applied category theory is to look at and understand the mathematical structures in the applied context, in probability in this case, and then abstracting to categorical structures from there. So in the case of KL divergence, I don't think that a description of the form

\epsilon \circ \mu \circ (p \otimes q)

will exist, since I don't know what the individual meaning of

\epsilon

and

\mu

in probability theory would be. At least one of them would have to be non-linear, since KL divergence is non-linear, which is at odds with the fact that morphisms in Markov categories (and variants thereof) are taken to be linear in the probabilities. Perhaps one can entertain categories with morphisms that act non-linearly, but then one needs to answer the question of what their probabilistic meaning and significance is.

One possible use for monoid structures

\mu : X \otimes X \to X

would be to axiomatize categories of monoid-valued random variables. For example with real-valued random variables, whose distributions are morphisms

I \to \mathbb{R}

, we may be interested in adding two of them. This amounts to composing their joint distribution with

+ : \mathbb{R} \otimes \mathbb{R} \to \mathbb{R}

. I don't think that anyone has thought about such categories yet, and prior do doing so one should have an idea of what one is trying to achieve with it. (In many probability theory statements, like the laws of large numbers, one is interested in averaging a bunch of random variables rather than adding them. Although averaging and adding of real numbers are operations which differ only by a factor of

1/n

, I've come to regard their categorical generalizations as quite different kinds of beasts: addition is captured by monoid structures, while averaging has more to do with Representable Markov categories.

Hopefully some of this gives some additional insight beyond what Javier already said.

Hakimi Rashid (Dec 11 2020 at 12:42):

Thank you @Tobias Fritz and @Javier Prieto for your reply. Does this means that we cannot define any statistical distance at all in Markov category? Or, is there any statistical distance that is 'natural' and definable in Markov category?

If, let say we want to have categorical framework for statistical distance, do we need to start from scratch defining new category or we can add structures to Markov category by somehow enriching it in order to capture notions of statistical distance?

Javier Prieto (Dec 11 2020 at 15:31):

This is not an answer but it's pointing at one: you may find this stream and the links therein interesting. In particular, there is this paper

in which relative entropy is defined as a functor from

\mathrm{FinStat}

[0, \infty]

. The definition of

\mathrm{FinStat}

is a bit involved, but I think it embeds into

\mathrm{FinStoch}

because the morphisms are essentially certain diagrams involving stochastic maps. I don't know if this functor can be extended/defined on

\mathrm{FinStoch}

Tobias Fritz (Dec 11 2020 at 16:48):

I don't have a definite answer on the last question either, but very roughly: distances are real numbers, and finding any general categorical construction which outputs real numbers is tricky. It would have to involve some kind of limit. Assuming a suitable kind of enrichment would indeed be the easier and perhaps more sensible thing to do, and I would hope that many of the existing results for Markov categories can be generalized to the enriched case, and thereby apply in situations where one wants to talk about e.g. approximate equality of distributions.

On the other hand, although getting intrinsic categorical notions of distance is difficult, @Eigil Rischel has recently proposed an intrinsic topoloogy on the hom-sets of every Markov category (satisfying some mild conditions). That this is possible has been very surprising to me. It's current work in progress, so I'm afraid that I can't say more at this point, simply because we don't really know much more.

Hakimi Rashid (Dec 12 2020 at 05:08):

Thank you @Javier Prieto for sharing those resources and @Tobias Fritz for sharing the current progress on this issue. I really hope and pray that these will be sorted out and get published soon because most part of my current work can be captured by Markov category I think.

Hakimi Rashid (Dec 13 2020 at 02:28):

I am reading this paper https://arxiv.org/pdf/1709.00322.pdf 'Disintegration and Bayesian Inversion via String Diagrams' by cho and jacobs. In the paper, under the 7th section - Beyond causal channels, they mentioned about 'enlarging' the CD category which enables the notions of scalars and effects

p: X\to I

. What do these concepts mean intuitively in a probability setting? Can we use them somehow to define KL divergence?

Tobias Fritz (Dec 13 2020 at 08:59):

That just refers to including probability distributions and channels which are not normalized. When they say "causal", they're referring to a categorical formulation and generalization of the normalization of probability.

Hakimi Rashid (Dec 13 2020 at 14:41):

After reading your paper with @John Baez on relative entropy functor, I ve come to realize that any distance function is a function that kind of live in between 2 categories, if I understand it correctly. The input of the function live in one category for example markov category in the case of probability distribution and the output live in the realm of monoid of real number. Exception would be vector space because vectors and the output of inner product can live in the same category. Is this right?

Tobias Fritz (Dec 13 2020 at 15:13):

That's one way to do it, and it's one which seems to capture entropy and KL-divergence particularly accurately. But it's also possible to work with categories enriched in metric spaces, so that one has a given measure of distance between any two morphisms (between the same two objects). For example the category of channels has a canonical enrichment like this if one measures the distance between two channels by (the channel generalization of) the total variation distance. Also Wasserstein distance can be treated like this. Of course, the difference is that these distances are then additional structures on the category and not canonically determined from the categorical structure alone.

Hakimi Rashid (Dec 13 2020 at 15:17):

can we then just abstract this notion of distance and similarity and capture it in single categorical construct that can cover all distance measure?

Hakimi Rashid (Dec 13 2020 at 15:19):

Tobias Fritz (Dec 13 2020 at 15:26):

Categories enriched in metric spaces are indeed a single categorical concept which covers a lot of distance measures. (KL divergence is one of the exceptions, due to the failure of the triangle inequality.) But this observation in itself doesn't really do much, since it's just a definition. It's about the same as saying that the definition of metric space covers all distance measures. That's true, but has little use in itself; it's more like a starting point for the development of a theory.

I don't see any relation to structured cospans, but perhaps someone who has worked with those can say more.

John Baez (Dec 14 2020 at 07:07):

Hakimi Rashid (Dec 15 2020 at 15:02):

Owh... I was thinking about a construction akin to functor category

[-,-]

in which the slots are not fixed but can be inserted with various compatible categories. So, we can have cospan like so:

i:X\to [-,-]\gets Y:o

which results in functor category

[X,Y]

. Is there such a thing? Or is it total nonsense?

Morgan Rogers (he/him) (Dec 15 2020 at 15:06):

It doesn't parse - that is, it doesn't type-check.

[-,-]

isn't a category if it doesn't have anything plugged into it, so what kind of morphisms should

i

and

o

be?

Hakimi Rashid (Dec 15 2020 at 15:11):

ok. it is total nonsense then. Is there any categorical construction that can achieve a similar goal? or the goal itself is not achievable?

Hakimi Rashid (Dec 15 2020 at 15:40):

What if i define a 2-category with the following diagram

i:X\to [X,Y]\gets Y:o

then define a 2-functor to other 2-category. will it achieved the goal of having variable categories?

Morgan Rogers (he/him) (Dec 15 2020 at 15:58):

In general, we can think of the construction

[-,-]

as a

2

-functor

\mathbf{Cat}^{\mathrm{op}} \times \mathbf{Cat} \to \mathbf{Cat}

Morgan Rogers (he/him) (Dec 15 2020 at 16:00):

You likely can't do this with cospans in a nice way, since if

X

is a non-empty category and

Y

is the empty category, then

[X,Y]

is also empty, so no functor

i:X \to [X,Y]

even exists..!

Hakimi Rashid (Dec 15 2020 at 16:03):

Morgan Rogers (he/him) (Dec 15 2020 at 16:08):

What do you expect your functors

i

and

o

to do? More generally, do you have an aim that's not accomplished by understanding

[-,-]

itself as a 2-functor, without the cospan attached?

Morgan Rogers (he/him) (Dec 15 2020 at 16:08):

(That's not intended to be confrontational, I just want more of an insight into where you're going with this :grinning_face_with_smiling_eyes: )

Hakimi Rashid (Dec 15 2020 at 16:11):

The reason is I want to have a construction that generalize distance and divergence along this line of thinking.

Hakimi Rashid (Dec 15 2020 at 16:13):

Morgan Rogers (he/him) (Dec 15 2020 at 17:40):

In a typical distance situation you have a "pairing"

\langle -,- \rangle

which takes a pair of "things"

X

and

Y

and outputs a "value"

\langle X,Y \rangle

. When

X

and

Y

are members of spaces or categories, we might impose the restriction that the pairing is natural in its variables, which is to say that varying

X

and

Y

gives a comparable variation in the value of

\langle X,Y \rangle

.
However, I don't know what situation you're imagining where there are mappings/transformations

X \rightarrow \langle X,Y \rangle \leftarrow Y

; that's what I want to understand. You don't need these mappings in order for the pairing to make sense; you can just start with the pair

(X,Y)

consisting of the input and output, without requiring that those maps/morphisms exist.

Hakimi Rashid (Dec 16 2020 at 12:20):

https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf
This is because I wanted to follow the approach taken by these papers above. They defined KL divergence as a functor between 2 categories. I want to generalize this to other distance measures as well which might take different categories as input and output. I think what I should use is a category of functor categories right?

The pairing

\langle -,- \rangle

is a construction akin to inner product right? where both

X

and

Y

are the inputs and

\langle X, Y \rangle

is the output. If I understand their paper correctly,

X

should be the input category and

Y

is the output category and

[X,Y]

is the fuctor category where KL divergence live.

Hakimi Rashid (Dec 16 2020 at 13:48):

So, if this correct, then we can describe inner product as the following:

[Vect,Vect]

i:Vect\to [Vect,Vect]\gets Vect:o

. right?

Morgan Rogers (he/him) (Dec 16 2020 at 14:08):

No, inner products act on elements within vector spaces. So in that case, for a specific vector space

V

, an inner product is a bilinear map

V \times V \to \mathbb{R}

. It's not an operation on the ordinary category of vector spaces.

Hakimi Rashid (Dec 16 2020 at 14:37):

but if we think of

\R

as 1-dimensional vector space, can we somehow define the bilinear map using endofunctor? So, I was thinking of it like paraphrasing the definition in term of a more higher abstraction.

Morgan Rogers (he/him) (Dec 16 2020 at 17:57):

But there can be many inner products on a given vector space (or, more generally, pairings between vector spaces), and linear maps do not necessarily respect them or canonically extend them, so the construction of inner products is not functorial.

Hakimi Rashid (Dec 17 2020 at 07:00):

I don't understand. Would you kindly elaborate more on inner products being not functorial? And also the issues of having many inner products.

Morgan Rogers (he/him) (Dec 17 2020 at 11:02):

On making inner products functorial: the first problem is that there is no canonical way to equip a vector space

V

with an inner product. One way to resolve this is to work instead with vector spaces equipped with a basis, since in that case there is a canonical inner product making that basis orthonormal. Next, one has the task of deciding which morphisms to choose: a morphism

f:(V_1,\mathcal{B}_1) \to (V_2, \mathcal{B}_2)

should consist of a linear map, but how should that interact with the basis, bearing in mind that we want the functor to send morphisms to transformations between inner product spaces? Well, one way that we can express that a linear map "respects the inner product" is that it should end up satisfying the equation

\langle f(x), f(y) \rangle_2 = \langle x, y \rangle_1

, where

\langle -, - \rangle_1, \langle -, - \rangle_2

are the inner products on the respective spaces. In other words, we want the linear maps

f

whose matrix with respect to the bases

\mathcal{B}_1

and

\mathcal{B}_2

has orthonormal columns in

\mathbb{R}^{|\mathcal{B}_2|}

. The result is a category of vector spaces with bases on which we can functorially assign inner products, but it no longer looks very much like

\mathrm{Vect}

, since for example there are no morphisms from higher dimensional spaces to lower dimensional spaces.

Morgan Rogers (he/him) (Dec 17 2020 at 11:05):

Note in particular that if you had wanted to view

\mathrm{Vect}

as a category enriched over itself, this restricted category is no-longer

\mathrm{Vect}

-enriched.

Morgan Rogers (he/him) (Dec 17 2020 at 11:12):

As for "the construction

[\mathrm{Vect},\mathbb{R}]

", we can make

\mathbb{R}

into an ordinary category (by equipping it with morphisms representing the ordering) but this structure is not particularly compatible with its vector space structure, and in particular any functors from

\mathrm{Vect}

into this category are constant, so reading that as a functor category won't give you anything resembling an inner product. We could instead interpret

\mathbb{R}

as a one-object category enriched over the category of real vector spaces, but then

[\mathrm{Vect},\mathbb{R}]

consists of functors which necessarily map each vector space to that one object, and provides a linear map from each vector space of linear maps

V \to W

\mathbb{R}

. That's something similar to an inner product, but still not quite what you're looking for, I think.

Hakimi Rashid (Dec 17 2020 at 11:48):

wow, thank you for your explanation. I need some time to unpack the whole thing. Looks like there's a lot of homework I need to do just to understand your answer.

Morgan Rogers (he/him) (Dec 17 2020 at 12:01):

When understanding CT, it helps to unpack all of the definitions, if nothing else to make sure that something you've written makes sense! Good luck :grinning:

Hakimi Rashid (Dec 17 2020 at 13:25):

If I understand this part,

[Vect,Vect]

that captures inner product will only be true for very limited subset of

Vect

, right?

Hakimi Rashid (Dec 17 2020 at 14:07):

I think this is quite similar to what they define for relative entropy functor in those papers, right?

Morgan Rogers (he/him) (Dec 17 2020 at 17:00):

No, their relative entropy is a specific functor;

[\mathrm{Vect},\mathbb{R}]

is a whole category of functors.

Morgan Rogers (he/him) (Dec 17 2020 at 17:04):

It seems like this notation does not mean what you think it means.

[\mathrm{Vect},\mathrm{Vect}]

means either the category of ordinary endofunctors of

\mathrm{Vect}

, or the category of

\mathrm{Vect}

-enriched endofunctors of

\mathrm{Vect}

. I don't think either of these things in any way "captures inner product".

John Baez (Dec 17 2020 at 20:35):

Hakimi Rashid (Dec 18 2020 at 03:08):

Oh. I get it. If I try to generalize by simply going higher in the hierarchy of abstraction, then it will carry with it whole other baggage that is unnecessary and unrelated to the concepts I am trying to generalize.

Hakimi Rashid (Dec 18 2020 at 03:16):

I thought I can generalize both and other statistical distances by following the pattern laid out in the papers. Has it ever been done before? Or it is something that is not possible?

Hakimi Rashid (Dec 18 2020 at 13:16):

If we restrict ourselves to only statistical distances and exclude vector space and inner product then we can still capture many other statistical distances / similarity measures such as correlations, KL divergences, wasserstein distance and etc using functor category right?

Hakimi Rashid (Dec 18 2020 at 13:25):

So, the construction

i:X\to [X,Y]\gets Y:o

still applies. We just need to define compatible categories X and Y. Is this correct?

Morgan Rogers (he/him) (Dec 18 2020 at 15:22):

But the thing you keep describing doesn't look anything like what appears in the papers you mentioned!!

Morgan Rogers (he/him) (Dec 18 2020 at 15:27):

Morgan Rogers (he/him) (Dec 18 2020 at 15:28):

There is just one rather special functor in each situation. The notation

[0,\infty]

is classical notation for "the positive real numbers, with infinity", rather than a functor category, which might have confused you?

Hakimi Rashid (Dec 18 2020 at 15:38):

Oh no. I am really confused now. I need to dissect the whole thing again to pinpoint where I might go wrong. KL divergence in their paper is a functor right?

Hakimi Rashid (Dec 18 2020 at 15:45):

A functor between the input category (in the paper Finstat and Sbstat) and the output category ('measure category').

Hakimi Rashid (Dec 18 2020 at 15:46):

Hakimi Rashid (Dec 18 2020 at 15:50):

hence if i want to define other measures(eg, correlation, wasserstein and etc) categorically, I can follow their 'formula' by defining a functor for each measure between specific category to compatible measure category.

Hakimi Rashid (Dec 18 2020 at 15:51):

Hakimi Rashid (Dec 18 2020 at 15:54):

every specific measure is a specific functor between specific categories. they live in different functor category.

Hakimi Rashid (Dec 18 2020 at 15:57):

I thought if i can have a construction where I can vary the input category and the output category and thus functor category, I can generalised the definition, in which, relative entropy functor is one specific example.

Morgan Rogers (he/him) (Dec 18 2020 at 16:00):

Morgan Rogers (he/him) (Dec 18 2020 at 16:03):

Rather than considering the whole functor category (which probably contains a lot more stuff than you need, yet on the other hand which it's hard to find anything much in!) a good approach would be to identify what features make their construction work. What makes a good "input category" or "output category"? What features make this approach work? In particular, which features are common with the other examples that you want to consider?

Hakimi Rashid (Dec 18 2020 at 16:07):

By that, you mean it is possible to have a single categorical construct that can be used to define them all? or there is no such thing and thus I need to define them one by one?

Morgan Rogers (he/him) (Dec 18 2020 at 16:08):

There could well be a single construction that works for many of them; it's all about finding the right level of generality :grinning_face_with_smiling_eyes:

Hakimi Rashid (Dec 18 2020 at 16:10):

Morgan Rogers (he/him) (Dec 18 2020 at 16:18):

I appreciate that wasn't a very specific pointer to what you should do! Concretely, I'd say you have enough work ahead if you just focus on defining categories on which the measures make sense, categories which are suitable for measuring, and concrete examples of functors between them, so I'm just discouraging you from dealing with whole functor categories before you're ready. It might be that the properties featuring in the article (semicontinuous, convex linear, etc) have interesting translations into properties of the corresponding objects in

[\mathbf{FinStat},[0,\infty]]

, and I personally would be very interested to discover if that is the case, but finding that out might not be the most direct route to the generalisation you're seeking.

Hakimi Rashid (Dec 19 2020 at 07:06):

Thank you for your tips and guidance. So, this has never been done before? Or if you know someone who has / is currently working on it? I actually was hoping that this has been done before so that I can use the result as part of my work.

Hakimi Rashid (Dec 31 2020 at 15:10):

Hi again. I was wondering... Can we define characteristic function of a random variable within Markov category?

Tobias Fritz (Dec 31 2020 at 15:25):

Briefly, I would say that we can't define characteristic functions in Markov categories just yet. Markov categories are much more general than plain old probability theory. Characteristic functions on the other hand are a concept rather specific to real-valued random variables in the traditional sense (although a very powerful concept for sure). That's why I think that there won't be something like a characteristic function in general Markov cats.

But it's entirely conceivable that there is something like a characteristic function for states in certain Markov categories on objects which "look like" the real numbers. I'm pretty sure though that nobody knows how to do this so far, so it's one of the many open questions.

It's also possible that there will be another concept for suitable Markov categories which is different from the characteristic function, but can replace the latter for some of its purposes, such as in the proof of the central limit theorem. But all of this is speculation at the moment, so just take it as a description of the scope of conceivable possibilities.

Hakimi Rashid (Dec 31 2020 at 22:49):

Ok. Thank you for your feedback @Tobias Fritz . Another question if I may... from what I have seen so far, deterministic maps are defined as maps that respect copying but in terms of string diagrams they are drawn the same as probabilistic ones, so, how can we differentiate between the two? is it just by looking at the context in which they occur ?

John Baez (Dec 31 2020 at 23:27):

It's easy to write down a string diagram that expresses the fact that a map respects copying. I guess if you want you can draw such maps in a different color or something.

Hakimi Rashid (Dec 31 2020 at 23:36):

Oh. I was thinking about doing something like that... but probably my question should be, why so far people haven't done that? Is it because you can just look at the context, so there is no need for differentiating them diagrammatically?

John Baez (Dec 31 2020 at 23:42):

I don't know why or even if they haven't, but go ahead and do it - if you're the first, maybe people will use your notation.

Hakimi Rashid (Dec 31 2020 at 23:49):

I don't know why or even *if* they haven't, but go ahead and do it - if you're the first, maybe people will use your notation.

I haven't seen it... but my knowledge is very limited. Maybe @Tobias Fritz knows better.

Nathaniel Virgo (Jan 01 2021 at 04:47):

There was some brief discussion on this topic here. (I'm in favour of graphically distinguishing stochastic morphisms from deterministic ones, as I think it makes things easier to follow, but Tobias gave some reasonable points against it.)

Hakimi Rashid (Jan 01 2021 at 05:02):

Hakimi Rashid (Jan 01 2021 at 07:02):

I 'm trying to adapt the relative entropy construction following https://arxiv.org/pdf/1402.3067.pdf and https://www.cs.mcgill.ca/~prakash/Pubs/ccre.pdf using Markov category. In the papers, the morphism between objects in

FinStat

is a pair

(f,s):(X,p)\to (Y,q)

and RE functor send them to

S(p,s\circ q)

. Why do we need the

f

morphism? can we do it with just

s

leaving out

f

Tobias Fritz (Jan 01 2021 at 07:52):

Quite generally, John's point is probably the best: if you invent a piece of notation and like it, then go ahead and use it! If it's useful and makes things more intelligible to others, then others will start using it too.

During the writing on our latest paper, we had indeed considered using a separate notation for deterministic morphisms. As in the discussion that @Nathaniel Virgo has linked to, there are advantages and disadvantages, and in the end we decided against doing it. But I'd be curious to see a paper which does it, in order to see how it pans out in practice.

Tobias Fritz (Jan 01 2021 at 08:13):

Interesting question! Perhaps there's a way to do without it, which would be very interesting to see. But in

S(p,s\circ q)

, the role played by the map

f

is that it is measure-preserving between

p

and

q

, meaning that

q = f \circ p

, which you can regard as the definition of

q

. Then you may just as well write

S(p,s\circ f \circ p)

. So when putting it like this, it's actually the

q

which is not needed!

Hakimi Rashid (Jan 01 2021 at 08:29):

I see. By the way, I'm also considering the kind of solution that would be brought about by following enriched Markov category similar to this thesis https://www.erischel.com/documents/mscthesis.pdf. Just that, I don't know which would be easier and generalize more to other statistical distances. Maybe to define other distances, we need to consider different categories on which to enrich Markov category?

John Baez (Jan 01 2021 at 17:44):

There are lots of different categories, good for different things. In the category that Tobias and I called

\mathsf{FinStat}

a morphism describes 1) how a state of the system being observed deterministically produces an observation, and 2) a recipe for guessing the state of the system from an observation. Part 1) is the measure-preserving function

f

from

X

Y

, and part 2) is the stochastic map

s

from

Y

back to

X

I think if you read the introduction to our paper you'll see that both of these are used to compute relative entropy. We give the formula, and it involves both

f

and

s

. We also explain what's going on.

Hakimi Rashid (Jan 01 2021 at 23:56):

Thank you for the input @John Baez . I've read the introduction part of that paper and if I understand it correctly, you have framed the definition on the ground of the scenario or example of how RE might be used. However, in the case of the setting of application I'm considering, I don't think I have the

f

morphism. I want to compare 2 time series from sensor recordings

X

and

Y

that have values in

\R

Hakimi Rashid (Jan 02 2021 at 00:12):

So, in this setting, I want to know how different / similar they are to each other.

Hakimi Rashid (Jan 02 2021 at 00:19):

I'm guessing the reason is because if we apply fourier transform to a random variables then we land in another category outside of Markov category. Is this true?

Tobias Fritz (Jan 02 2021 at 08:11):

Not exactly. The reason is because there is no such thing as "apply the Fourier transform" in the first place. How would that be defined?

Hakimi Rashid (Jan 02 2021 at 12:14):

I was thinking of defining it as functor. Maybe I don't quite grasp the whole concept just yet.

Hakimi Rashid (Jan 10 2021 at 04:53):

Hi again. I'm still in the dark of why it does not make sense to define fourier transform of random variable as functor from Markov category to 'another' category CF category:

FT:Markov\to CF

and its inverse as

iFT:CF\to Markov

. So, we can have monad as composition of the two

T: iFT\circ FT

. Would you kindly walk me through it? Thank you.

Tobias Fritz (Jan 10 2021 at 08:55):

When you propose a new mathematical idea, then you need to explain why you think that it does make sense, and this seems to be missing here. In other words, what does your proposal have to do with characteristic functions or the Fourier transform at all? Which categories and which functors do you need to pick in order to obtain a categorical description of the classical Fourier transform?

So what I know is that a real-valued random variable is, categorically speaking, a morphism

1 \to \mathbb{R}

in the Markov category Stoch (or BorelStoch), which is the Kleisli category of the Giry monad on measurable spaces (or standard Borel spaces). What I don't know is what your

CF

and

FT

might be, and how these would describe the Fourier transform.

Also, if

iFT

is the inverse of

FT

, then the induced monad is the identity monad, which does not seem interesting. Perhaps you mean adjoint rather than inverse?

Hakimi Rashid (Jan 10 2021 at 09:25):

So, here you mean I need to define a category of characteristic function in such a way that

FT

is a functor that will capture the notion of Fourier transform, right?

Hakimi Rashid (Jan 10 2021 at 09:32):

Why the identity monad does not seem interesting in this case? If adjoint, what could it mean concretely?

I thought the main idea of using fourier transform is to work in 'transformed' space that completely preserve the information of the original space. So you can do something within this space that would be hard to do in original space and can transform back the result to the original space. So, identity seems 'correct', right?

Tobias Fritz (Jan 10 2021 at 14:03):

All I mean is that you need to explain the meaning and significance of your idea in order for anyone else to be able to comment on it. One way to achieve that is to explain which particular category

CF

and which functor

FT

you have in mind.

The identity monad is never interesting, just like the identity functor isn't an interesting functor or the one-element group isn't an interesting group. There isn't much that you can do with the identity monad on any category C: both it's Kleisli category and its Eilenberg-Moore category are just C again.

That sounds like a good description. But this doesn't necessarily mean that it's appropriate to do the same thing at the categorical level. The categorical generalization of a standard concept often takes quite a different form than the oiriginal concept. For example, epimorphisms are a categorical generalization of surjective functions, but the definition of epimorphism and the definition of surjective function are quite different.

Hakimi Rashid (Jan 10 2021 at 14:18):

Maybe i should start with some context. I am working with time series data of random variables. There are many methods that have been developed so far to compute the statistical 'distance\ similarity' between 2 time series. These include KL divergence, mutual information and etc. This time series can also be represented by their frequency domain by applying Fourier transform on each time series. We can then also compute the 'distance/ similarity' between them within the frequency domain. Examples include coherence and many more.

For the time domain, I think Markov category is one of suitable category that I can work with. But for the frequency domain, I thought there should exist a 'dual' of Markov category or maybe other category that somehow connected to the original Markov category.

My aim is to distill / unify the process of :
1) 'embedding' time series to particular representation
2) use this representation structure to compute 'statistical distance'

Am I on the right track on using Markov category so far? Or should I consider other direction?

Hakimi Rashid (Jan 11 2021 at 06:15):

Say, what if the objects in category CF are the Fourier transformed pdf of random variables, morphisms in CF are Fourier transformed of morphism between random variables in Markov category. would that be correct for what im aiming for?

Tobias Fritz (Jan 11 2021 at 06:18):

Yes, that clarifies things for me. If I understand correctly, the Fourier transform that you're taking is not the characteristic function, right? Because that would be

\mathbb{E}[e^{itX}]

for one fixed random variable

X

, and this function is no longer random because the expectation value has been taken. While what you're doing is to take the Fourier transform in time separately for every realization of the process, which is then itself a random function. Right?

As far Markov categories, a number of people have already asked about developing stochastic process theory within that framework, but so far this doesn't exist yet. I can imagine that the equivalence that you describe has a categorical description in Markov category terms, but instead of it being an equivalence or isomorphism between two Markov categories, to me it looks more like an isomorphism of objects internal to a single Markov category.

In any case, for thinking about such a thing I guess it would help to already have some stochastic process theory in place for Markov categories — so you may want to think about this more generally first. How do you define a Markov category of stochastic processes? What are the morphisms? And how can one formulate and prove some of the basic theorems on stochastic processes?

Hakimi Rashid (Jan 11 2021 at 06:27):

Ah, yes. I realize my mistake. I thought what I'm looking for is characteristic function since both involve taking Fourier transform. But what you describe is closer to what im aiming for. It should be random.

So, fourier tranform is deterministic but in combination with the pdf, the result is still random.

FT\circ \psi : I \to X \to FT(X)

. Does this make sense? because the result have different sample space (frequency) than that of original sample space.

Hakimi Rashid (Jan 11 2021 at 06:32):

So, the objects ( Fourier transformed random variable (RV) of stochastic process) live in Markov category since they are still RV?

Tobias Fritz (Jan 11 2021 at 11:11):

Hakimi Rashid (Jan 11 2021 at 11:59):

Hakimi Rashid (Jan 17 2021 at 15:32):

Hi, again. Regarding RE functor which was developed in the paper ( https://arxiv.org/pdf/1402.3067.pdf). You and @John Baez have considered the case where there is one system

X

and one measurement

Y

with morphisms between them are

f: X\to Y

regarded as 'measuring process' and

s:Y\to X

as 'hypothesis'. Given a true probability

q:X\to \R

and a 'prior'

p = s\circ r

, RE is defined such that it is the amount of information when we update our prior to true probability.

Whereas, in my case, I am considering 2 systems each with probabilities on their states

(X,p)

and

(Y,q)

. Both send signals and can be recorded as time series. They may communicate with each other thus influencing each other states with a certain probability. I want to define RE between them by adapting the definition provided by the paper.

First, define channels

c_{x,y}: X\to Y

and

c_{y,x}:Y\to X

to represent the communications between them. Then, RE is a functor that send objects to single object of category

[0,\infty]

, morphism

c_{y,x}:Y\to X

S(p,c_{y,x}\circ q)

and morphism

c_{x,y}:X\to Y

S(q, c_{x,y}\circ p)

. Would this work?

Hakimi Rashid (Jan 17 2021 at 15:36):

Also, I want to define mutual information and correlation between the two systems following the similar line of thinking...

Javier Prieto (Jan 18 2021 at 15:47):

Have you checked that your definition does what you need in a simple setting, like FinStoch?

Hakimi Rashid (Jan 18 2021 at 16:13):

No. I haven't. Could you show me how or point the way? Im no mathematician. my background is in biology but I want to use category theory to describe part of the data analysis that are involved in my current research.

Javier Prieto (Jan 18 2021 at 18:57):

Are your channels stochastic matrices? If so, you can work in FinStoch (the category with finite sets as objects and stochastic matrices as morphisms) and try to prove your candidate functor

S

respects identities and composition.

Hakimi Rashid (Jan 18 2021 at 22:32):

Hakimi Rashid (Jan 20 2021 at 04:46):

John Baez (Jan 20 2021 at 19:19):

I'm afraid nobody is answering your question because nobody knows what it means. "Is

FinStoch

enough?" is vague. It may be hard to turn this into a mathematically precise question, but if you do, then more people will answer it.

Hakimi Rashid (Jan 20 2021 at 22:26):

Oh. I see. I think I know somewhat the answer... In practice, we can only record finite amount of signal and can only analyze finite amount of data, so working with category

FinStoch

should be enough. Regarding Fourier transform, in practice we use fast Fourier transform algorithm which is discrete Fourier Transform and we apply the transform also on finite time series, so again

FinStoch

should also be the right category. Is this correct?
When working in

FinStoch

category, we limit ourselves to Finite set, right?

John Baez (Jan 20 2021 at 22:33):

Hakimi Rashid (Jan 20 2021 at 22:39):

Thank you. Maybe I should limit myself to finite category since that is what we do in practice.

John Baez (Jan 20 2021 at 22:40):

It makes a bunch of theorems easier to prove, which is why Tobias and I focused on that case. Measure theory is simple on finite sets.

Hakimi Rashid (Jan 20 2021 at 22:57):

To be honest... I was expecting of grabbing low-hanging fruits when I stumbled upon category theory. I was expecting that people have worked on all the parts that are related to what I do and I just can cite their theorem / proves and compile them together. When I found Evan's thesis, https://arxiv.org/pdf/2006.08945.pdf, I hit the jackpot. But, it seems, the works did not cover correlation and other statistical distance/similarity. Are you aware of any work that has been done on this or related to this?

John Baez (Jan 20 2021 at 23:01):

There's a huge amount of work on category theory, but Evan - a young guy who just finished grad school - is one of the first to apply it to statistics. You've hit the jackpot if you want to help develop an interesting new branch of mathematics... but not if you just want to "cite theorems".

John Baez (Jan 20 2021 at 23:02):

Anyway, you should ask @Evan Patterson, not me, what work he's aware of in this area. He's the expert.

Eric Forgy (Jan 20 2021 at 23:06):

This is probably not very helpful, but I remember Tom Leinster had some interesting articles about population and diversity. It is not traditional statistics I guess, but I actually had some success applying that to finance replacing populations of penguins with financial securities. When markets all move together, e.g. during a crisis, the effective diversity of the population decreases. I used it to indentity extreme conditions for stress testing.

Hakimi Rashid (Jan 20 2021 at 23:08):

John Baez (Jan 20 2021 at 23:09):

Wow, you financed replacing populations of penguins with financial securities? What a typical Wall Street thing to do! :upside_down:

Spencer Breiner (Jan 20 2021 at 23:10):

You might also look at the paper here, which give a relation between the cumulants of a distribution (mean, variance, central moment, ...) and homotopy theory, though it is not an easy paper.
https://arxiv.org/abs/1302.3684

Eric Forgy (Jan 20 2021 at 23:10):

Hakimi Rashid (Jan 20 2021 at 23:14):

Thank you @Spencer Breiner and @Eric Forgy . I'll take my time and go thru the papers.

Eric Forgy (Jan 20 2021 at 23:14):

John Baez (Jan 20 2021 at 23:19):

Penguins may go extinct, but Collateralized Penguin Obligations will remain. :penguin: :penguin: :penguin:

Eric Forgy (Jan 20 2021 at 23:19):

CPOs precipitated the last global financial crisis, but don't blame me :sweat_smile:

Fawzi Hreiki (Jan 20 2021 at 23:20):

Eric Forgy (Jan 20 2021 at 23:21):

Eric Forgy (Jan 20 2021 at 23:31):

Semi-seriously, diversification is important in portfolio management and Tom's ideas on diversity certainly apply to financial time series where each time series is like a different species. That is my second "first" in mathematical finance I guess (although never published). The first "first" was applying noncommutative geometry to finance :nerd:

Hakimi Rashid (Jan 29 2021 at 16:05):

Hi, all. Another question. How can we define algebraic operation on random variables in Markov category? for example, addition, subtraction, multiplication and etc.

Hakimi Rashid (Jan 31 2021 at 09:00):

would this work:
1)for multiplication,

mult:X\otimes Y\to XY

,
2)addition,

add:X\otimes Y\to X+Y

3)subtraction,

sub:X\otimes Y\to X-Y

4)Division,

div:X\otimes Y\to X\div Y

Hakimi Rashid (Feb 02 2021 at 04:57):

In https://arxiv.org/pdf/2006.08945.pdf, interacting supply was introduced so that we can have several structures such as vector spaces be defined within Markov category. In this case, if we supply a Markov category with vector spaces, can we then define inner product within Markov category?

John Baez (Feb 02 2021 at 05:29):

Evan Patterson (Feb 02 2021 at 05:48):

Not directly, since an abstract vector space doesn't come with an inner product. You can get partway there by introducing a symmetric bilinear map, since those properties are purely equational. At least within the framework of my thesis, you can't express the positive definiteness of an inner product.

Hakimi Rashid (Feb 02 2021 at 07:40):

Thank you for your reply Evan. Im gonna need some time to digest it. In the meantime, do you happen to know the answer to my question regarding the algebraic operation on random variables?

Hakimi Rashid (Feb 02 2021 at 08:08):

Thank you John, that feels welcoming. But, the current me is really not equipped to develop new math. I don't have formal higher math education/training. Currently, I can only digest and hopefully understand proven theorems and link them together and, if applicable, use them in my work.

John Baez (Feb 02 2021 at 17:45):

I understand. Then you may be somewhat frustrated: perhaps not all the tools you want exist yet. You can work with the tools that exist, or try to to find a mathematician collaborator who can develop new tools that you need. To work with mathematicans it helps to state your needs as precisely as possible, in a lot of detail, starting by describing your background assumptions. I often don't understand what you're saying.

It sounds like Evan understood your last question, and his reply makes a lot of sense to me. He said the definition of inner product isn't purely equational: it involves an inequality too. From this I guess he has a framework for introducing purely equational concepts into the theory of random variables. This makes sense, because there's a lot of math developed for purely equational theories, like "Lawvere theories".

All this is just a bunch of guesses based on his reply: I haven't read the paper of his, that you're talking about. I mention my guesses just to show that a mathematician's view may be very different than yours: what's clear to you may be mysterious to the mathematician, and what's mysterious to you may be clear to the mathematician. So, you need to put a lot of energy into clear communication if you want to reach mathematicians.

Hakimi Rashid (Feb 05 2021 at 15:56):

Sorry to bother you again, @Evan Patterson. In your thesis, you mentioned loss function

L(\theta,d(X))

in the notes and references section of chapter 3. From the string diagram, I think we can define it as

L: \Omega \otimes d(X) \to \R

. Is this true?

Evan Patterson (Feb 06 2021 at 22:16):

The loss function has the form

L: \Omega \times \mathcal{A} \to \mathbb{R}

: for a given parameter

\theta \in \Omega

, what is the loss under action

a \in \mathcal{A}

? Decision rules have the form

d: \mathcal{X} \to \mathcal{A}

: given the data

x \in \mathcal{X}

, what action do I take?

AFAIK, people always take the loss to be deterministic, but you could make it randomized. Decision rules can also be randomized and sometimes actually are. For example, resampling procedures like cross-validation are often randomized. In the end, decision theorists study the risk (expected loss), which averages out the randomness in the samples, the decision rule, and the loss (if you were to allow randomized losses).

Hakimi Rashid (Feb 07 2021 at 04:54):

Thank you for the input @Evan Patterson . I was thinking of defining relative entropy similar to the form of the loss function. Since KL divergence is the expected log difference between 2 probabilities

P

and

Q

, can we define

KL = E\circ L: P \times Q\to \R

in Markov category?

where

E

is the expectation and

L

is the log difference between

P

and

Q

Stream: learning: questions

Topic: Probability dual

Hakimi Rashid (Dec 07 2020 at 15:29):

Javier Prieto (Dec 09 2020 at 10:42):

Hakimi Rashid (Dec 09 2020 at 14:36):

Javier Prieto (Dec 10 2020 at 12:29):

Hakimi Rashid (Dec 11 2020 at 03:06):

Javier Prieto (Dec 11 2020 at 08:53):

Tobias Fritz (Dec 11 2020 at 09:19):

Hakimi Rashid (Dec 11 2020 at 12:42):

Javier Prieto (Dec 11 2020 at 15:31):

Tobias Fritz (Dec 11 2020 at 16:48):

Hakimi Rashid (Dec 12 2020 at 05:08):

Hakimi Rashid (Dec 13 2020 at 02:28):

Tobias Fritz (Dec 13 2020 at 08:59):

Hakimi Rashid (Dec 13 2020 at 14:41):

Tobias Fritz (Dec 13 2020 at 15:13):

Hakimi Rashid (Dec 13 2020 at 15:17):

Hakimi Rashid (Dec 13 2020 at 15:19):

Tobias Fritz (Dec 13 2020 at 15:26):

John Baez (Dec 14 2020 at 07:07):

Hakimi Rashid (Dec 15 2020 at 15:02):

Morgan Rogers (he/him) (Dec 15 2020 at 15:06):

Hakimi Rashid (Dec 15 2020 at 15:11):

Hakimi Rashid (Dec 15 2020 at 15:40):

Morgan Rogers (he/him) (Dec 15 2020 at 15:58):

Morgan Rogers (he/him) (Dec 15 2020 at 16:00):

Hakimi Rashid (Dec 15 2020 at 16:03):

Morgan Rogers (he/him) (Dec 15 2020 at 16:08):

Morgan Rogers (he/him) (Dec 15 2020 at 16:08):

Hakimi Rashid (Dec 15 2020 at 16:11):

Hakimi Rashid (Dec 15 2020 at 16:13):

Morgan Rogers (he/him) (Dec 15 2020 at 17:40):

Hakimi Rashid (Dec 16 2020 at 12:20):

Hakimi Rashid (Dec 16 2020 at 13:48):

Morgan Rogers (he/him) (Dec 16 2020 at 14:08):

Hakimi Rashid (Dec 16 2020 at 14:37):

Morgan Rogers (he/him) (Dec 16 2020 at 17:57):

Hakimi Rashid (Dec 17 2020 at 07:00):

Morgan Rogers (he/him) (Dec 17 2020 at 11:02):

Morgan Rogers (he/him) (Dec 17 2020 at 11:05):

Morgan Rogers (he/him) (Dec 17 2020 at 11:12):

Hakimi Rashid (Dec 17 2020 at 11:48):

Morgan Rogers (he/him) (Dec 17 2020 at 12:01):

Hakimi Rashid (Dec 17 2020 at 13:25):

Hakimi Rashid (Dec 17 2020 at 14:07):

Morgan Rogers (he/him) (Dec 17 2020 at 17:00):

Morgan Rogers (he/him) (Dec 17 2020 at 17:04):

John Baez (Dec 17 2020 at 20:35):

Hakimi Rashid (Dec 18 2020 at 03:08):

Hakimi Rashid (Dec 18 2020 at 03:16):

Hakimi Rashid (Dec 18 2020 at 13:16):

Hakimi Rashid (Dec 18 2020 at 13:25):

Morgan Rogers (he/him) (Dec 18 2020 at 15:22):

Morgan Rogers (he/him) (Dec 18 2020 at 15:27):

Morgan Rogers (he/him) (Dec 18 2020 at 15:28):

Hakimi Rashid (Dec 18 2020 at 15:38):

Hakimi Rashid (Dec 18 2020 at 15:45):

Hakimi Rashid (Dec 18 2020 at 15:46):

Hakimi Rashid (Dec 18 2020 at 15:50):

Hakimi Rashid (Dec 18 2020 at 15:51):

Hakimi Rashid (Dec 18 2020 at 15:54):

Hakimi Rashid (Dec 18 2020 at 15:57):

Morgan Rogers (he/him) (Dec 18 2020 at 16:00):

Morgan Rogers (he/him) (Dec 18 2020 at 16:03):

Hakimi Rashid (Dec 18 2020 at 16:07):

Morgan Rogers (he/him) (Dec 18 2020 at 16:08):

Hakimi Rashid (Dec 18 2020 at 16:10):

Morgan Rogers (he/him) (Dec 18 2020 at 16:18):

Hakimi Rashid (Dec 19 2020 at 07:06):

Hakimi Rashid (Dec 31 2020 at 15:10):

Tobias Fritz (Dec 31 2020 at 15:25):

Hakimi Rashid (Dec 31 2020 at 22:49):

John Baez (Dec 31 2020 at 23:27):

Hakimi Rashid (Dec 31 2020 at 23:36):

John Baez (Dec 31 2020 at 23:42):

Hakimi Rashid (Dec 31 2020 at 23:49):

Nathaniel Virgo (Jan 01 2021 at 04:47):

Hakimi Rashid (Jan 01 2021 at 05:02):

Hakimi Rashid (Jan 01 2021 at 07:02):