Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: learning: questions

Topic: convex analysis


view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 03:57):

Hi everyone,
Currently, I'm working on a project finding niche subsets of larger datasets using some techniques in convex optimization. As part of my research, I have come across the concept of strong duality in convex spaces, but I have yet to find a categorification of these ideas. Has anyone worked on this or knows of any resources where I can read about this? I'm aware of the categories Conv and ConvRel, but I am having a difficult time finding a definitive summary of them. My first guess is that strong duality when categorically formalized may involve a functor from the category of pointed convex spaces, Conv_\star, to the pre/total order category on \R. Formalizing the primal (minimization/maximization) problem in such a fashion would then yield the dual (maximization/minimization) problem when examining the opposite category.

view this post on Zulip John Baez (Jan 12 2022 at 04:22):

What's "strong duality" for convex spaces? I might know something if you explain it, but I've never heard of it.

view this post on Zulip David Egolf (Jan 12 2022 at 04:34):

John Baez said:

What's "strong duality" for convex spaces? I might know something if you explain it, but I've never heard of it.

Doing a google search, I found these notes:
https://people.eecs.berkeley.edu/~elghaoui/Teaching/EE227A/lecture8.pdf

A segment that seems relevant:
strong duality

view this post on Zulip David Egolf (Jan 12 2022 at 04:44):

A bit more about duality, in a way I find more intuitive (from "Sparse Convex Optimization Methods for Machine Learning"):
duality

The idea described here is to approximate a convex function at each point by a supporting hyperplane. Then you can try to minimize this approximation over some domain of interest.
(I have used some of this stuff when working on reconstructing images, where the image reconstruction process can be viewed as optimizing a convex function.)

view this post on Zulip David Egolf (Jan 12 2022 at 04:50):

(Also, this is a great resource for convex optimization: https://web.stanford.edu/~boyd/cvxbook/ )

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 05:17):

David Egolf said:

(Also, this is a great resource for convex optimization: https://web.stanford.edu/~boyd/cvxbook/ )

This is exactly the primary book I'm referencing. I think they develop the concept especially well and give some intuition.

My brief summary is that when certain hypotheses (feasibility is sufficient in least squares, quadratic, and linear programming) in convex and affine spaces hold for a convex objective function which we wish to minimize (the primal problem), we can write out the the dual problem Lagrangian. Maximizing the dual Lagrangian w.r.t. the Lagrange/KKT multipliers then becomes equivalent to the minimization of the primal objective function; when this occurs, we have strong duality, and sup Lagrangian = inf objective.

My research has a strong emphasis on the role of Lagrange multipliers as first-order approximations for a sort of generalized constraint force (e.g.,$ \beta = 1/(k_B T)$ in thermodynamics), so ideally any categorical P.O.V. will develop the role of Lagrange multipliers. I tried looking at @John Baez 's new paper "Compositional Thermostatics" to see if I could extract any intuition from how temperature is described, but I don't think I've explored enough to know if I can find any connections; however, I suspect there is something there as, in thermostatics, we are trying to maximize entropy---a concave function---of a distribution (a convex space) subject to the constraint of constant energy. The notion that the dual problem estimates constraints with an additional linear term in the Lagrangian is equivalent to the notion that we can approximate that heat exchange between the system and the bath is insignificant compared to the bath (the thermodynamic limit). This last paragraph is definitely a little conjecture-y, but I was inspired by the thermostatics paper to start thinking a bit about these ideas.

Thank for any help!

view this post on Zulip John Baez (Jan 12 2022 at 05:27):

Thanks! Certainly Lagrange multipliers and the Legendre transform of convex (or concave) functions are fundamental in thermodynamics, and my paper with @Owen Lynch on compositional thermostatics is a kind of warmup for studying those things in a modern way... but we don't actually say anything about them in our paper!

view this post on Zulip John Baez (Jan 12 2022 at 05:29):

So there's more to be done.

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 05:46):

Is that to say that convex optimization hasn't been developed much in category theory or specifically in the context of thermodynamics? Do you know if anyone is working on related topics now? Thanks!

view this post on Zulip John Baez (Jan 12 2022 at 05:50):

A large chunk of thermodynamics and statistical mechanics basically is convex optimization.

view this post on Zulip John Baez (Jan 12 2022 at 05:50):

So if that's the connection you're trying to make, you can just read about thermodynamics, stat mech and convex optimization and see how they fit together.

view this post on Zulip John Baez (Jan 12 2022 at 05:51):

I was saying that modern math - the category of convex sets, operads, and such - is just starting to be applied, to clean up these ideas and make them more powerful.

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 06:06):

I see. I'm trying to find maximally-informative subsets of large datasets by using the MacArthur Consumer Resource model (see this paper https://arxiv.org/abs/1809.04221). Each data point becomes a consumer in an ecosystem which competes with other data points for 'resources'. If all the consumers are aiming to maximize consumption (framed as a least squares problem) subject to the constraint that there are a finite amount of resources and data points can only consume certain resources at certain rates, some consumers will go extinct like a species pushed out of a niche in an ecosystem. This may be expressed by asking whether the Lagrange multiplier associated with a constraint is non-zero; if it is, the data points 'survives'. So far, we've implemented numerical experiments which have shown that training several different machine learning algorithms on these niche subsets of the MNIST training data produces higher accuracy on the test data compared to training on a randomly selected subset of the training data of the same size as the niche subset. I am hoping to find some category theory somewhere in this project, and I thought I might look to see if Lagrange multipliers are discussed at all in category theory, but I haven't found anything too elucidating yet,

view this post on Zulip Peter Arndt (Jan 12 2022 at 09:06):

Simon Willerton offered a categorical perspective on the Legendre-Fenchel transform. It's not a _categorified_ version, but maybe it helps anyway. Here are a preprint and a video.

view this post on Zulip Peter Arndt (Jan 12 2022 at 09:10):

The Legendre-Fenchel dual of a function is what Boyd/Vandenberghe call the conjugate function. It is closely related to the dual optimization problem, see Boyd/Vandenberghe, Section 5.1.6.

view this post on Zulip Peter Arndt (Jan 12 2022 at 09:11):

Your idea sounds cool!

view this post on Zulip Jules Hedges (Jan 12 2022 at 10:28):

I tried for a short time to formulate linear programming duality as a duality in the sense of category theory. I think it's likely that it's possible and a good idea, but it's definitely a research question and not a "learning" question

view this post on Zulip Jules Hedges (Jan 12 2022 at 10:30):

Side comment: Linear programming is arguably the method that the world runs on, I can't overstate the extent to which it's used behind the scenes by supply chains. If, hypothetically, methods coming from category theory could be used to make linear programming say 1% more efficient computationally, it would probably outweigh everything else we could do combined...

view this post on Zulip Simon Willerton (Jan 12 2022 at 11:50):

As well as the stuff on Legendre-Fenchel transform that Peter mentions above, I've been thinking about optimization in the context of enriched category theory more generally. As Jules said, this is a serious research project, and I'm making the occasional foray into it. One thing I have been thinking out loud at the n-Café about is the specific optimization problem of optimal transport.

In this specific problem of optimal transport, I think there's deeper connections wtih category theory that I've not yet discussed there related to Kantorovich duality and probability monads. (Paolo Perrone and Tobias Fritz have certainly been thinking about things there.) Convex spaces and concave have categorical descriptions as well in terms of probability or convexity monads. So I think this is a ripe area.

view this post on Zulip Simon Willerton (Jan 12 2022 at 11:56):

With regard to strong duality that was asked about. One view of this is that weak duality is given by the max-min inequality for a Lagrangian and strong duality is when that is actually an equality. Here are my vague categorical thoughts on that in Part II.

Viewed in this way, the max-min inequality

infx0supy0L(x,y)supy0infx0L(x,y)\inf_{x\ge 0} \,\sup_{y\ge 0} \mathcal{L}(x,y) \ge \sup_{y\ge 0} \,\inf_{x\ge 0} \mathcal{L}(x,y)

is a manifestation of the canonical morphism that exists between expressions with a limit and colimit interchanged: for a functor L ⁣:X×YR\mathcal{L} \colon X \times Y \to R, there is a canonical morphism

colimXlimYLlimYcolimXL, \mathrm{colim}_X \mathrm{lim}_Y \mathcal{L} \to \mathrm{lim}_Y \mathrm{colim}_X \mathcal{L},

see the nLab. Similarly the minimax theorem holding is perhaps to be thought of as a statement about certain colimits and limits commuting.

view this post on Zulip Fabrizio Genovese (Jan 12 2022 at 14:36):

Me and @Dan Marsden categorified convexity for language a bunch of years ago: https://arxiv.org/pdf/1703.01204.pdf

view this post on Zulip Fabrizio Genovese (Jan 12 2022 at 14:36):

Probably tangential to what you are looking for, but maybe interesting?

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 16:19):

Thank you so much for sending these resources! I'll be sure to read through them and would love to stay up-to-date on any research in this area. I would love to be involved in this research, but I fear that I might not have the skills/experience/knowledge to work on these issues as I'm only an undergraduate student. This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways; hopefully, I'll revisit this later in my career and will be able to introduce more category-theoretic concepts.

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 16:20):

I try to visit the n-category cafe pretty often, so hopefully I'll see much more on optimization there sometime in the future!

view this post on Zulip Morgan Rogers (he/him) (Jan 12 2022 at 16:26):

Always great to see an undergrad looking ahead to things they would like to research involving category theory :grinning_face_with_smiling_eyes:

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 16:28):

Thanks :) Hopefully I'll see some of you at the Adjoint School this summer for which I will be applying!

view this post on Zulip John Baez (Jan 12 2022 at 17:27):

but I fear that I might not have the skills/experience/knowledge to work on these issues as I'm only an undergraduate student.

That just means that you're younger and smarter than other people working on this subject.

view this post on Zulip John Baez (Jan 12 2022 at 17:28):

This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways...

Ha, that was quick. Is it that you were hoping something had been worked out that you could rather quickly apply?

view this post on Zulip John Baez (Jan 12 2022 at 17:30):

Anyway, that's fine.

The conversation has been useful to me already, since I just realized from what Simon said that this "strong duality" business, which I'd never heard of, has the minimax theorem in game theory as a special case.

view this post on Zulip John Baez (Jan 12 2022 at 17:31):

I occasionally teach game theory to undergrads, leading up to the proof of the minimax theorem for 2-person games, which I'm sure must seem rather nightmarishly complicated to many of them.

view this post on Zulip John Baez (Jan 12 2022 at 17:32):

In this proof, the "easy part" is proving the inequality

infx0supy0L(x,y)supy0infx0L(x,y)\inf_{x\ge 0} \,\sup_{y\ge 0} \mathcal{L}(x,y) \ge \sup_{y\ge 0} \,\inf_{x\ge 0} \mathcal{L}(x,y)

which you can do just by fiddling around, whereas the "hard part" is proving the reverse inequality.

view this post on Zulip John Baez (Jan 12 2022 at 17:33):

And as Simon had pointed out to me earlier, the easy part is just a general fact about limits and colimits. The hard part is something more special.

view this post on Zulip John Baez (Jan 12 2022 at 17:34):

So it's interesting that this is something that people think about and call "strong duality".

view this post on Zulip John Baez (Jan 12 2022 at 17:36):

I am hoping to find some category theory somewhere in this project, and I thought I might look to see if Lagrange multipliers are discussed at all in category theory, but I haven't found anything too elucidating yet.

I believe Simon's stuff about Legendre transforms is secretly the stuff about Lagrange multipliers that you're seeking.

view this post on Zulip John Baez (Jan 12 2022 at 17:38):

At least, that's true if the functions you're trying to maximize subject to constraints are concave, or the functions you're trying to minimize subject to constraints are concave: this is the case where the method of Lagrange multipliers becomes something called the Legendre transform.

view this post on Zulip John Baez (Jan 12 2022 at 17:40):

Hopefully I'll see some of you at the Adjoint School this summer for which I will be applying!

I hope you also apply to the American Mathematical Society's Mathematical Research Community on applied category theory. This is another summer program on applied category theory, which I'm teaching a course in.

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 20:24):

John Baez said:

This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways...

Ha, that was quick. Is it that you were hoping something had been worked out that you could rather quickly apply?

Yes, I was hoping something had been worked out so that I could apply it to my research. However, I spoke briefly with @Brendan Fong, and he recommended that preparing a literature review on the category Conv/ConvRel could be useful to my own understanding in addition to other researchers/students.

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 20:26):

I've been looking at that work on Legendre transforms in category theory and am finding that there might be something I can build off of! Thank you so much for all the feedback, and I'll be sure to apply to that as well!

view this post on Zulip Emmy Blumenthal (they/them) (Jan 12 2022 at 21:40):

As I'm reading and writing some things out about Legendre transform from the categorical POV, but I realize that I don't precisely understand their correspondence to Lagrange multipliers. I am noticing that the general theme is that maximizing L=f(x)+iλigi(x){\cal L} = f(x) + \sum_i \lambda_i g_i(x), the Lagrangian of a (convex) cost function, ff (i.e., minimize f(x) w.r.t. x), where gi(x)0g_i(x) \leq 0 are the imposed constraints and λi\lambda_i are the associated Lagrange multipliers, looks just like maxpiipixi+f(x)\max_{p_i} \sum_i p_i x_i + f(x), but I have yet to find an exact statement discussing their correspondence. However, this general concept of using Legendre transforms to work with concave functions is very familiar from statistical mechanics. Does anyone know where I can find a precise statement of this relationship?

Also, please let me know if I'm getting a sign wrong somewhere.

Thank you so much for any ideas!

view this post on Zulip John Baez (Jan 12 2022 at 21:56):

Sign issues are always annoying because some people are minimizing convex functions while others are maximizing concave ones, and separately some people might write +iλigi(x)+\sum_i \lambda_i g_i(x) while others might write iλigi(x)-\sum_i \lambda_i g_i(x): these are separate decisions. But this can be straightened out when you actually need to: then you just carefully write down your own assumptions and do your own calculations. So, I wouldn't worry about it until I need to.

view this post on Zulip John Baez (Jan 12 2022 at 21:58):

The basic idea is this, I think. You've got a function f:RnRf: \mathbb{R}^n \to \mathbb{R} that you're trying to maximize subject to a constraint

g(x)=cg(x) = c

where g:RnRg : \mathbb{R}^n \to \mathbb{R} is some other function.

view this post on Zulip John Baez (Jan 12 2022 at 21:59):

So, you look for critical points of the function

f+λg f + \lambda g

where λR\lambda \in \mathbb{R} is some new variable.

view this post on Zulip John Baez (Jan 12 2022 at 22:01):

Say this just has one critical point, which of course depends on λ\lambda. This assumption of "one critical point" will hold under some conditions.

view this post on Zulip John Baez (Jan 12 2022 at 22:03):

(To be continued, and perhaps edited.)