Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: learning: questions

Topic: Categorifying the median or the average


view this post on Zulip Peva Blanchard (Feb 28 2024 at 23:32):

Hi,

I have questions regarding ways to "aggregate things". We do have well-known notions of statistics, like the "average" or the "median" of a collection of numbers. It is also well-known that these statistics can be reformulated as optimization problems, e.g.:

average(S)=arg minzxSxz2 \textrm{average}(S) = \argmin_z \sum_{x \in S} |x - z|^2

or

median(S)=arg minzxSxz \textrm{median}(S) = \argmin_z \sum_{x \in S} |x - z|

It is also well-known that limits/co-limits are some kind of optimization problems. So it seems natural to ask if one can reformulate the average or median (or similar) of a set of objects in category-theoretical terms.

I thought it would be easy to google my way through, but it turns out not to be so easy to find.

Are there known references that approach that question?

ps: I am thinking about something as follows. Let XX be a VV-enriched category (à la Lawvere metric space), and SS a finite subset of objects of XX. For any object zz of XX, we can compute the "total cost" ϕz\phi_z in VV

ϕz=xSX(z,x) \phi_z = \bigotimes_{x \in S} X(z,x)

with X(z,x)X(z,x) the hom-object in VV. Then we can look for the terminal object (minimizer) of the full subcat spanned by the ϕz\phi_z, and check how it can be realized in XX, and so on. Or something like that.

pps: The closest paper I found was a paper by Matilde Marcolli on "Pareto categories", where she provides a reformulation of Pareto optimization in the context of categories of resources.

view this post on Zulip Peva Blanchard (Feb 28 2024 at 23:32):

ppps: It's also quite funny that the "median" may possibly be easier to reformulate than the "average". For the average we would need to give meaning to quadratic expressions v2v^2 for objects vv in VV.

view this post on Zulip Oscar Cunningham (Feb 29 2024 at 22:32):

A quadratic expression might use both X(z.x)X(z.x) and X(x,z)X(x,z) in a nicely symmetric way.

view this post on Zulip Paolo Perrone (Mar 01 2024 at 10:50):

Exactly. I've been wondering this myself:
We would like to have something along these lines:
xd(z,x)d(x,z)\sum_{x} d(z,x) \cdot d(x,z)
which seems very similar to this:
xC(z,x)×C(x,z).\coprod_x C(z,x) \times C(x,z) .
Now in enriched category theory we can replace the Cartesian product ×\times with a more general monoidal structure \otimes. However, as far as I know, we cannot replace the coproduct by a more general other monoidal structure, over which \otimes distributes.
(And if we enrich in real numbers, the coproduct is either "sup" or "inf", not the usual sum.)
So what I think is going on here is a further generalization of the idea of enriched category.

I've been thinking for a while about this, without ever managing to figure it out. If anyone has concrete ideas, I'd love to hear about them!

view this post on Zulip Paolo Perrone (Mar 01 2024 at 10:56):

For context, this would not just be for averages, but in general to talk about (usual) matrix multiplication.

view this post on Zulip Ralph Sarkis (Mar 01 2024 at 13:26):

Can all generalized means be formulated as optimization problems? (in particular, can the geometric mean?)

view this post on Zulip Peva Blanchard (Mar 01 2024 at 13:31):

Ralph Sarkis said:

Can all generalized means be formulated as optimization problems? (in particular, can the geometric mean?)

Yes, there is a summary table on Wikipedia.

view this post on Zulip Peva Blanchard (Mar 01 2024 at 13:47):

Paolo Perrone said:

Exactly. I've been wondering this myself:
We would like to have something along these lines:
xd(z,x)d(x,z)\sum_{x} d(z,x) \cdot d(x,z)
which seems very similar to this:
xC(z,x)×C(x,z).\coprod_x C(z,x) \times C(x,z) .
Now in enriched category theory we can replace the Cartesian product ×\times with a more general monoidal structure \otimes. However, as far as I know, we cannot replace the coproduct by a more general other monoidal structure, over which \otimes distributes.
(And if we enrich in real numbers, the coproduct is either "sup" or "inf", not the usual sum.)
So what I think is going on here is a further generalization of the idea of enriched category.

I've been thinking for a while about this, without ever managing to figure it out. If anyone has concrete ideas, I'd love to hear about them!

This looks very tempting to play with indeed! I guess we could just assume that VV has two compatible monoidal structures, like (Vect,,)(Vect, \otimes, \oplus). Aren't there papers about VectVect-enriched categories?

We can also think about matrices as profunctors, and composition is given by the coend

C(x,z)=yA(x,y)B(y,z) C(x,z) = \int^y A(x,y) \otimes B(y,z)

I'm sure you already know that. But then why the "coend" picture doesn't fit the bill for you?

view this post on Zulip James Deikun (Mar 01 2024 at 17:00):

I think the "median" given in the original question is just the product in a Lawvere metric space actually.

view this post on Zulip James Deikun (Mar 01 2024 at 17:08):

Furthermore, it's actually a "medoid" and depending which Lawvere metric space structure you put on the sample space it captures a lot of the different generalized means.

view this post on Zulip Paolo Perrone (Mar 01 2024 at 19:35):

Peva Blanchard said:

Paolo Perrone said:

Exactly. I've been wondering this myself:
We would like to have something along these lines:
$\sum_{x} d(z,x) \cdot d(x,z)$
which seems very similar to this:
$\coprod_x C(z,x) \times C(x,z) .$
Now in enriched category theory we can replace the Cartesian product $\times$ with a more general monoidal structure $\otimes$. However, as far as I know, we cannot replace the coproduct by a more general other monoidal structure, over which $\otimes$ distributes.
(And if we enrich in real numbers, the coproduct is either "sup" or "inf", not the usual sum.)
So what I think is going on here is a further generalization of the idea of enriched category.

I've been thinking for a while about this, without ever managing to figure it out. If anyone has concrete ideas, I'd love to hear about them!

This looks very tempting to play with indeed! I guess we could just assume that $V$ has two compatible monoidal structures, like $(Vect, \otimes, \oplus)$. Aren't there papers about $Vect$-enriched categories?

We can also think about matrices as profunctors, and composition is given by the coend

C(x,z)=yA(x,y)B(y,z) C(x,z) = \int^y A(x,y) \otimes B(y,z)

I'm sure you already know that. But then why the "coend" picture doesn't fit the bill for you?

In nice situations, a coend is a coequalizer of a coproduct. I guess one can replace the "coproduct" part and keep the equalizer? I don't know.

I can't come up with a consistent definition of "generalized enriched category" in that context though.

view this post on Zulip Peva Blanchard (Mar 03 2024 at 10:47):

James Deikun said:

I think the "median" given in the original question is just the product in a Lawvere metric space actually.

Mmh, I'm not sure about that, because it seems to me that the product in a Lawvere metric space is rather ill-behaved.

Let V=([0,],,+,0)V = ([0, \infty], \geq, +, 0) be Lawvere's Cost category. Let W:SVW : S \rightarrow V be a weighting over SS (seen as a discrete category), and ll be the corresponding weighted product of objects of SS (assuming it exists). Then, unfolding the definition of weighted limit, we have for every object ss in SS, and every other object zz in XX

X(l,s)WsX(z,l)=supsS{X(z,s)Ws} X(l, s) \le W s\\ X(z, l) = \sup_{s \in S} \{ X(z,s) - W s \}

where the minus sign stands for truncated subtraction. In particular, if the weighting assigns 00 to every object of SS, then the limit ll is at distance 00 from every object in SS, which, most of the time, is possible only when SS is a singleton.

Taking co-products will not help much either as it will just swap sup\sup for inf\inf. It seems to me that the median/medoid/mean/etc. cannot be expressed as a weighted limits/co-limits of this kind.

view this post on Zulip James Deikun (Mar 03 2024 at 12:41):

Ah, you're right, I guess it's not quite a limit. Actually you could view sSX(x,s)\bigotimes_{s \in S} X(x,s) as the space of cones with vertex xx and here instead of the one terminal cone you're looking for an object xx so that every cone over SS is represented by some cone with apex xx instead of a particular one. So it's a particular kind of multi-limit I guess?

view this post on Zulip Peva Blanchard (Mar 03 2024 at 16:38):

Well, I tried to look at some concrete examples, when we take only two objects, say aa and bb.

With the enriching category V=(Set,×,1)V = (Set, \times, 1). We are looking at the full sub-category Γ\Gamma of SetSet spanned by the

ϕz=X(a,z)×X(b,z)\phi z = X(a, z) \times X(b, z)

Now, if XX has a terminal object 11, then

ϕ1=X(a,1)×X(b,1)1\phi 1 = X(a, 1) \times X(b,1) \simeq 1

which is terminal in SetSet, and thus terminal in Γ\Gamma as well. This implies that the "median" of aa and bb in XX is just the terminal object of XX.

view this post on Zulip Peva Blanchard (Mar 03 2024 at 16:58):

This argument, however, does not apply if we choose the set-co-product as monoidal product instead, i.e., V=(Set,+,0)V = (Set, +, 0). The full sub-cat Γ\Gamma is spanned by

ϕz=X(a,z)+X(b,z)\phi z = X(a, z) + X(b,z)

for all zz in XX. If XX has a terminal object 11, we have

ϕ1=X(a,1)+X(b,1)1+1\phi 1 = X(a, 1) + X(b, 1) \simeq 1 + 1

which is not terminal in SetSet.

If we assume that ϕm\phi m is terminal in Γ\Gamma for some mm in XX, it means we have, for all zz, a unique function

X(a,z)+X(b,z)X(a,m)+X(b,m)X(a,z) + X(b,z) \rightarrow X(a,m) + X(b,m)

I think this implies that X(a,m)+X(b,m)=1X(a,m) + X(b,m) = 1, which in turn implies

X(a,m)=1X(b,m)=0X(a,m) = 1\\ X(b,m) = 0

or

X(a,m)=0X(b,m)=1X(a,m) = 0\\ X(b,m) = 1

Either way, it does not look very promising for mm.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:10):

I don't know if it's useful but I'm thinking to this: if you have a category enriched over Q0\mathbb{Q}_{\ge 0}-modules, then you can average a finite set of morphisms f1,...,fn:ABf_1,...,f_n:A \rightarrow B by taking 1n(f1+...+fn)\frac{1}{n}(f_1+...+f_n).

Probably even less useful: I'm saying this because it happened that one day I used a notion of "probabilistic permutation" (something that I called a "blender") in a symmetric monoidal Q0\mathbb{Q}_{\ge 0}-linear category that I defined as a natural transformation AnAnA^{\otimes n} \rightarrow A^{\otimes n} of the form 1MσMσ\frac{1}{|M|}\underset{\sigma \in M}{\sum}\sigma for some multiset MM of permutative natural transformations AnAnA^{\otimes n} \rightarrow A^{\otimes n}.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:19):

This all I know about probability + categories enriched over the category of (semi) vector spaces over a (semi) field :sweat_smile:.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:26):

By the way, why do you want to average objects? Isn't it more natural to average morphisms which can be interpreted as processes?

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:27):

When you take a limit, you take the limit of a cone, so it involves a set of morphisms rather than just a set of objects. I think the morphisms are the context which is creating the combinatorics of the situation thus allowing an interpretation in terms of optimization.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:29):

(just some random thoughts)

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:36):

Generalization (I think, maybe loosely) of the first sentence I wrote by replacing the addition by a tensor product and allowing the domains and codomains to be different: If you have a monoidal category Q0\mathbb{Q}_{\ge 0}-linear category (ie. enriched over Q0\mathbb{Q}_{\ge 0}-modules/semi vector spaces), you could say that the average of the fi:AiBif_i:A_i \rightarrow B_i for 1in1 \le i \le n, is 1n(f1...fn):A1...AnB1...Bn\frac{1}{n}(f_1 \otimes ... \otimes f_n):A_1 \otimes ... \otimes A_n \rightarrow B_1 \otimes ... \otimes B_n

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:46):

Jean-Baptiste Vienney said:

Probably even less useful: I'm saying this because it happened that one day I used a notion of "probabilistic permutation" (something that I called a "blender") in a symmetric monoidal Q0\mathbb{Q}_{\ge 0}-linear category that I defined as a natural transformation AnAnA^{\otimes n} \rightarrow A^{\otimes n} of the form 1MσMσ\frac{1}{|M|}\underset{\sigma \in M}{\sum}\sigma for some multiset MM of permutative natural transformations AnAnA^{\otimes n} \rightarrow A^{\otimes n}.

By the way, if ever it was useful, I found that you can write 12 different commutative diagrams involving the interaction of three different operations on blenders derived from tensor product, composition and enrichment. So I found this notion very cool.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:47):

Now I stop talking about my stuff :grinning_face_with_smiling_eyes:

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:56):

Can someone recall the interpretation of a limit as the solution of an optimization problem? I think it would be useful in order to understand if there is really something probabilistic behind this idea or if it is really vague to say that it has anything to do with some optimization.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 21:57):

Sometimes mathematicians talk about some "philosophical interpretation" of some concept and I often find it rather vague (like the interpretation of Yoneda lemma).

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 22:08):

Maybe, can someone also recall a definition of optimization of something in probability/statistics in order that we can compare the two ideas ? It could help in order to understand how to transfer the idea of mean from probability/statistics to category theory.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 22:20):

And does optimization really have anything to do with means? It seems to me that aggregation and thus means have to do with tensor products which are not limits.

view this post on Zulip Peva Blanchard (Mar 03 2024 at 22:49):

Jean-Baptiste Vienney said:

Sometimes mathematicians talk about some "philosophical interpretation" of some concept and I often find it rather vague (like the interpretation of Yoneda lemma).

You're right ! Now that you mention it, I'm not sure anymore if "it is well-known that limit/co-limits are some kind of optimization problems". I think I have in mind the case of posets with least upper bounds / greatest lower bounds. But the narrative certainly is too vague.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 22:59):

This probably not intelligent, but you could average objects like this in a a compact closed category with coproducts (= biproducts)
(I...I)(A1...An)(I \oplus ... \oplus I)^*(A_1 \oplus ... \oplus A_n) (nn copies of II)

view this post on Zulip Peva Blanchard (Mar 03 2024 at 23:01):

Jean-Baptiste Vienney said:

And does optimization really have anything to do with means? It seems to me that aggregation and thus means have to do with tensor products which are not limits.

Regarding ways to interpret "averaging" in category theory, there is a very interesting line of work by Tom Leinster and Tom Avery on codensity monads.

I still have to digest the details, so I'm going to be very hand-wavy. The "slogan" I understood is that the elements of the monad behave like integration operators. The usual integration operators are retrieved when considering the inclusion of a certain sub-category of the category of measurable spaces, whose codensity monad is the Giry monad. When considering the inclusion FinSetSetFinSet \rightarrow Set, we get the ultrafilter monad. And the corresponding integration operator corresponds to "voting w.r.t the ultrafilter".

view this post on Zulip Peva Blanchard (Mar 03 2024 at 23:05):

But for other reasons, I am also interested in the median (because it's a basic example of "robust" statistics). Hence me wondering we could categorify the "median". I thought that there could be a way through the formulation of the median as an optimization problem. But clearly, I'm not sure it's the right way.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 23:10):

I don't understand much but does it have anything to do with integrating with respect to a probability measure giving the probability of an event? and using the analogy coend \leftrightarrow integral?

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 23:11):

I'm just asking naive questions, I don't know much about codensity monads and probability theory but like to think and learn about any stuff :sweat_smile:

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 23:13):

Also, maybe weighted coends would be useful to average if this notion makes sense

view this post on Zulip Peva Blanchard (Mar 03 2024 at 23:26):

Well I will have trouble to answer, because I don't know much about codensity monads either ^^' they are on my list of "things it would be good to understand at some point".

However, it's not the analogy between coend and integral, although there is a formula involving an end.

If you have a functor G:BAG : B \rightarrow A, then you can ask for the right Kan extension of GG along itself, which, if the extension exists, yields a functor TG:AAT^G : A \rightarrow A. The universal property of the Kan extension makes TGT^G a monad on AA.

There is a formula using ends. For any object XX of AA

TGX=Y[A(X,GY),GY]T^G X = \int_Y [ A(X, GY), GY]

where [_,_]:Set×AA[\_,\_] : Set \times A \rightarrow A is a power.

Intuitively, an element of TGXT^G X is a machine that takes, for any YY, an element of A(X,GY)A(X, GY) and outputs an element of GYGY. In other words, an element of TGXT^G X acts like an integration operator for functions defined over XX.

One instance of this machinery lead to TGT^G being the Giry monad, i.e., the monad that associates to every measurable set XX the set of probability distributions over XX. Any such probability distribution on XX defines an integration operator for measurable functions on XX, i.e., the operator that output the expected value.

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 23:46):

I'm thinking about something completely different. I think a one-element category enriched over commutative monoids such that every non-zero morphism is invertible and id+...+idid+...+id is always nonzero is the same as a semi-field of characteristic 00 (the multiplication is given by the composition)

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 23:47):

And in a semi-field of characteristic 00 you can average by the formula 1n(x1+...+xn)\frac{1}{n}(x_1+...+x_n)

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 23:48):

which becomes here (id+...+id)1;(x1+...+xn)(id+...+id)^{-1};(x_1+...+x_n)

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 23:51):

I don't think it will help you in any way :joy:

view this post on Zulip Jean-Baptiste Vienney (Mar 03 2024 at 23:55):

Categorifying this, you get that if you have parallel morphisms f1,...,fn:ABf_1,...,f_n:A \rightarrow B in such a category, then you can define the mean as (id+...+id)1;(f1+...+fn)(id+...+id)^{-1};(f_1+...+f_n) but this just what I was saying before, nothing amazing (something like this, there is some nonsense in what I've written I think).

view this post on Zulip Jean-Baptiste Vienney (Mar 04 2024 at 00:07):

I should not talk when I don't have anything interesting to say and don't understand half of what is written, sorry.

view this post on Zulip Peva Blanchard (Mar 04 2024 at 07:14):

Jean-Baptiste Vienney said:

I should not talk when I don't have anything interesting to say and don't understand half of what is written, sorry.

Oh no don't worry for that :)

view this post on Zulip JR (Mar 04 2024 at 13:12):

Jean-Baptiste Vienney said:

By the way, why do you want to average objects? Isn't it more natural to average morphisms which can be interpreted as processes?

[Arrow category enters the chat]

view this post on Zulip James Deikun (Mar 04 2024 at 13:18):

Peva Blanchard said:

Well, I tried to look at some concrete examples, when we take only two objects, say aa and bb.

With the enriching category V=(Set,×,1)V = (Set, \times, 1). We are looking at the full sub-category Γ\Gamma of SetSet spanned by the

ϕz=X(a,z)×X(b,z)\phi z = X(a, z) \times X(b, z)

Now, if XX has a terminal object 11, then

ϕ1=X(a,1)×X(b,1)1\phi 1 = X(a, 1) \times X(b,1) \simeq 1

which is terminal in SetSet, and thus terminal in Γ\Gamma as well. This implies that the "median" of aa and bb in XX is just the terminal object of XX.

Well, in a category with a terminal object, the "distance" from any object to the terminal object is the monoidal unit and terminal object of Set\mathsf{Set}, aka "zero", so it's not surprising that it turns out this way when minimizing total distance...if you had the same situation in a metric space the same would happen.

view this post on Zulip Eric M Downes (Mar 12 2024 at 15:03):

Jean-Baptiste Vienney said:

Jean-Baptiste Vienney said:

Probably even less useful: I'm saying this because it happened that one day I used a notion of "probabilistic permutation" (something that I called a "blender") in a symmetric monoidal $\mathbb{Q}_{\ge 0}$-linear category that I defined as a natural transformation $A^{\otimes n} \rightarrow A^{\otimes n}$ of the form $\frac{1}{|M|}\underset{\sigma \in M}{\sum}\sigma$ for some multiset $M$ of permutative natural transformations $A^{\otimes n} \rightarrow A^{\otimes n}$.

By the way, if ever it was useful, I found that you can write 12 different commutative diagrams involving the interaction of three different operations on blenders derived from tensor product, composition and enrichment. So I found this notion very cool.

This is very interesting. I would love to see a writeup of this.

view this post on Zulip Eric M Downes (Mar 12 2024 at 15:20):

Paolo Perrone said:

xC(z,x)×C(x,z)\coprod_x C(z,x) \times C(x,z)
Now in enriched category theory we can replace the Cartesian product ×\times with a more general monoidal structure \otimes. However, as far as I know, we cannot replace the coproduct by a more general other monoidal structure, over which $\otimes$ distributes.

What if you formed a presheaf over the collection of products instead of a coproduct over them? Sheaves come equipped with stalks that can be used for convergence/ultrafilter kind of arguments -- if you have any measure of closeness or topology you could somehow filter down to the stalk which contains the single product most representative of the collection. This might have to do with the rate of stalk "narrowing" -- how many products it loses how fast... That's what we use averages and medians for anyway -- pick out a "representative member" of an enclosing set; here there may not be an enclosing thing, so we do our best with the members we have?

(Having equipped a site with a sheaf thus, you also usually get a sheaf of rings, using product and composition, I think... there is a forced sense in which \circ could distribute over ×\times by broadcasting arguments, though I have no idea if any of that gives you a desirable distributive property.)

view this post on Zulip Eric M Downes (Mar 12 2024 at 15:47):

I think all I have described reduces to taking a product of products, and then looking at various quotients of that which project down some products and lose others. Compositions of projectors then form a sequence that gives you the stalks on objects. The hope would be that sheaves might also give you more, but just looking at the "rate of change" of the cost over these sequences (πA×G×ZπA×B×CGZπA××Z\pi_{A\times G\times Z}\circ\pi_{A\times B\times C\ldots G\ldots Z}\circ\pi_{A\times\ldots \times Z}) might be illuminating, and would require no extra machinery... it looks vaguely geometric mean adjacent? I don't know how to actually access or manipulate them however, as I know only the rudiments of cost categories.