Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: learning: questions

Topic: convex analysis

Emmy Blumenthal (they/them) (Jan 12 2022 at 03:57):

Hi everyone,
Currently, I'm working on a project finding niche subsets of larger datasets using some techniques in convex optimization. As part of my research, I have come across the concept of strong duality in convex spaces, but I have yet to find a categorification of these ideas. Has anyone worked on this or knows of any resources where I can read about this? I'm aware of the categories Conv and ConvRel, but I am having a difficult time finding a definitive summary of them. My first guess is that strong duality when categorically formalized may involve a functor from the category of pointed convex spaces, Conv_\star, to the pre/total order category on \R. Formalizing the primal (minimization/maximization) problem in such a fashion would then yield the dual (maximization/minimization) problem when examining the opposite category.

John Baez (Jan 12 2022 at 04:22):

What's "strong duality" for convex spaces? I might know something if you explain it, but I've never heard of it.

David Egolf (Jan 12 2022 at 04:34):

John Baez said:

What's "strong duality" for convex spaces? I might know something if you explain it, but I've never heard of it.

Doing a google search, I found these notes:
https://people.eecs.berkeley.edu/~elghaoui/Teaching/EE227A/lecture8.pdf

A segment that seems relevant:
strong duality

David Egolf (Jan 12 2022 at 04:44):

A bit more about duality, in a way I find more intuitive (from "Sparse Convex Optimization Methods for Machine Learning"):
duality

The idea described here is to approximate a convex function at each point by a supporting hyperplane. Then you can try to minimize this approximation over some domain of interest.
(I have used some of this stuff when working on reconstructing images, where the image reconstruction process can be viewed as optimizing a convex function.)

David Egolf (Jan 12 2022 at 04:50):

(Also, this is a great resource for convex optimization: https://web.stanford.edu/~boyd/cvxbook/ )

Emmy Blumenthal (they/them) (Jan 12 2022 at 05:17):

David Egolf said:

(Also, this is a great resource for convex optimization: https://web.stanford.edu/~boyd/cvxbook/ )

This is exactly the primary book I'm referencing. I think they develop the concept especially well and give some intuition.

My brief summary is that when certain hypotheses (feasibility is sufficient in least squares, quadratic, and linear programming) in convex and affine spaces hold for a convex objective function which we wish to minimize (the primal problem), we can write out the the dual problem Lagrangian. Maximizing the dual Lagrangian w.r.t. the Lagrange/KKT multipliers then becomes equivalent to the minimization of the primal objective function; when this occurs, we have strong duality, and sup Lagrangian = inf objective.

My research has a strong emphasis on the role of Lagrange multipliers as first-order approximations for a sort of generalized constraint force (e.g.,$ \beta = 1/(k_B T)$ in thermodynamics), so ideally any categorical P.O.V. will develop the role of Lagrange multipliers. I tried looking at @John Baez 's new paper "Compositional Thermostatics" to see if I could extract any intuition from how temperature is described, but I don't think I've explored enough to know if I can find any connections; however, I suspect there is something there as, in thermostatics, we are trying to maximize entropy---a concave function---of a distribution (a convex space) subject to the constraint of constant energy. The notion that the dual problem estimates constraints with an additional linear term in the Lagrangian is equivalent to the notion that we can approximate that heat exchange between the system and the bath is insignificant compared to the bath (the thermodynamic limit). This last paragraph is definitely a little conjecture-y, but I was inspired by the thermostatics paper to start thinking a bit about these ideas.

Thank for any help!

John Baez (Jan 12 2022 at 05:27):

Thanks! Certainly Lagrange multipliers and the Legendre transform of convex (or concave) functions are fundamental in thermodynamics, and my paper with @Owen Lynch on compositional thermostatics is a kind of warmup for studying those things in a modern way... but we don't actually say anything about them in our paper!

John Baez (Jan 12 2022 at 05:29):

So there's more to be done.

Emmy Blumenthal (they/them) (Jan 12 2022 at 05:46):

Is that to say that convex optimization hasn't been developed much in category theory or specifically in the context of thermodynamics? Do you know if anyone is working on related topics now? Thanks!

John Baez (Jan 12 2022 at 05:50):

A large chunk of thermodynamics and statistical mechanics basically is convex optimization.

John Baez (Jan 12 2022 at 05:50):

So if that's the connection you're trying to make, you can just read about thermodynamics, stat mech and convex optimization and see how they fit together.

John Baez (Jan 12 2022 at 05:51):

I was saying that modern math - the category of convex sets, operads, and such - is just starting to be applied, to clean up these ideas and make them more powerful.

Emmy Blumenthal (they/them) (Jan 12 2022 at 06:06):

I see. I'm trying to find maximally-informative subsets of large datasets by using the MacArthur Consumer Resource model (see this paper https://arxiv.org/abs/1809.04221). Each data point becomes a consumer in an ecosystem which competes with other data points for 'resources'. If all the consumers are aiming to maximize consumption (framed as a least squares problem) subject to the constraint that there are a finite amount of resources and data points can only consume certain resources at certain rates, some consumers will go extinct like a species pushed out of a niche in an ecosystem. This may be expressed by asking whether the Lagrange multiplier associated with a constraint is non-zero; if it is, the data points 'survives'. So far, we've implemented numerical experiments which have shown that training several different machine learning algorithms on these niche subsets of the MNIST training data produces higher accuracy on the test data compared to training on a randomly selected subset of the training data of the same size as the niche subset. I am hoping to find some category theory somewhere in this project, and I thought I might look to see if Lagrange multipliers are discussed at all in category theory, but I haven't found anything too elucidating yet,

Peter Arndt (Jan 12 2022 at 09:06):

Simon Willerton offered a categorical perspective on the Legendre-Fenchel transform. It's not a _categorified_ version, but maybe it helps anyway. Here are a preprint and a video.

Peter Arndt (Jan 12 2022 at 09:10):

The Legendre-Fenchel dual of a function is what Boyd/Vandenberghe call the conjugate function. It is closely related to the dual optimization problem, see Boyd/Vandenberghe, Section 5.1.6.

Peter Arndt (Jan 12 2022 at 09:11):

Your idea sounds cool!

Jules Hedges (Jan 12 2022 at 10:28):

I tried for a short time to formulate linear programming duality as a duality in the sense of category theory. I think it's likely that it's possible and a good idea, but it's definitely a research question and not a "learning" question

Jules Hedges (Jan 12 2022 at 10:30):

Side comment: Linear programming is arguably the method that the world runs on, I can't overstate the extent to which it's used behind the scenes by supply chains. If, hypothetically, methods coming from category theory could be used to make linear programming say 1% more efficient computationally, it would probably outweigh everything else we could do combined...

Simon Willerton (Jan 12 2022 at 11:50):

As well as the stuff on Legendre-Fenchel transform that Peter mentions above, I've been thinking about optimization in the context of enriched category theory more generally. As Jules said, this is a serious research project, and I'm making the occasional foray into it. One thing I have been thinking out loud at the n-Café about is the specific optimization problem of optimal transport.

In this specific problem of optimal transport, I think there's deeper connections wtih category theory that I've not yet discussed there related to Kantorovich duality and probability monads. (Paolo Perrone and Tobias Fritz have certainly been thinking about things there.) Convex spaces and concave have categorical descriptions as well in terms of probability or convexity monads. So I think this is a ripe area.

Simon Willerton (Jan 12 2022 at 11:56):

With regard to strong duality that was asked about. One view of this is that weak duality is given by the max-min inequality for a Lagrangian and strong duality is when that is actually an equality. Here are my vague categorical thoughts on that in Part II.

Viewed in this way, the max-min inequality

$\inf_{x\ge 0} \,\sup_{y\ge 0} \mathcal{L}(x,y) \ge \sup_{y\ge 0} \,\inf_{x\ge 0} \mathcal{L}(x,y)$

is a manifestation of the canonical morphism that exists between expressions with a limit and colimit interchanged: for a functor $\mathcal{L} \colon X \times Y \to R$ , there is a canonical morphism

$\mathrm{colim}_X \mathrm{lim}_Y \mathcal{L} \to \mathrm{lim}_Y \mathrm{colim}_X \mathcal{L},$

see the nLab. Similarly the minimax theorem holding is perhaps to be thought of as a statement about certain colimits and limits commuting.

Fabrizio Genovese (Jan 12 2022 at 14:36):

Me and @Dan Marsden categorified convexity for language a bunch of years ago: https://arxiv.org/pdf/1703.01204.pdf

Fabrizio Genovese (Jan 12 2022 at 14:36):

Probably tangential to what you are looking for, but maybe interesting?

Emmy Blumenthal (they/them) (Jan 12 2022 at 16:19):

Thank you so much for sending these resources! I'll be sure to read through them and would love to stay up-to-date on any research in this area. I would love to be involved in this research, but I fear that I might not have the skills/experience/knowledge to work on these issues as I'm only an undergraduate student. This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways; hopefully, I'll revisit this later in my career and will be able to introduce more category-theoretic concepts.

Emmy Blumenthal (they/them) (Jan 12 2022 at 16:20):

I try to visit the n-category cafe pretty often, so hopefully I'll see much more on optimization there sometime in the future!

Morgan Rogers (he/him) (Jan 12 2022 at 16:26):

Always great to see an undergrad looking ahead to things they would like to research involving category theory :grinning_face_with_smiling_eyes:

Emmy Blumenthal (they/them) (Jan 12 2022 at 16:28):

Thanks :) Hopefully I'll see some of you at the Adjoint School this summer for which I will be applying!

John Baez (Jan 12 2022 at 17:27):

but I fear that I might not have the skills/experience/knowledge to work on these issues as I'm only an undergraduate student.

That just means that you're younger and smarter than other people working on this subject.

John Baez (Jan 12 2022 at 17:28):

This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways...

Ha, that was quick. Is it that you were hoping something had been worked out that you could rather quickly apply?

John Baez (Jan 12 2022 at 17:30):

Anyway, that's fine.

The conversation has been useful to me already, since I just realized from what Simon said that this "strong duality" business, which I'd never heard of, has the minimax theorem in game theory as a special case.

John Baez (Jan 12 2022 at 17:31):

I occasionally teach game theory to undergrads, leading up to the proof of the minimax theorem for 2-person games, which I'm sure must seem rather nightmarishly complicated to many of them.

John Baez (Jan 12 2022 at 17:32):

In this proof, the "easy part" is proving the inequality

$\inf_{x\ge 0} \,\sup_{y\ge 0} \mathcal{L}(x,y) \ge \sup_{y\ge 0} \,\inf_{x\ge 0} \mathcal{L}(x,y)$

which you can do just by fiddling around, whereas the "hard part" is proving the reverse inequality.

John Baez (Jan 12 2022 at 17:33):

And as Simon had pointed out to me earlier, the easy part is just a general fact about limits and colimits. The hard part is something more special.

John Baez (Jan 12 2022 at 17:34):

So it's interesting that this is something that people think about and call "strong duality".

John Baez (Jan 12 2022 at 17:36):

I am hoping to find some category theory somewhere in this project, and I thought I might look to see if Lagrange multipliers are discussed at all in category theory, but I haven't found anything too elucidating yet.

I believe Simon's stuff about Legendre transforms is secretly the stuff about Lagrange multipliers that you're seeking.

John Baez (Jan 12 2022 at 17:38):

At least, that's true if the functions you're trying to maximize subject to constraints are concave, or the functions you're trying to minimize subject to constraints are concave: this is the case where the method of Lagrange multipliers becomes something called the Legendre transform.

John Baez (Jan 12 2022 at 17:40):

Hopefully I'll see some of you at the Adjoint School this summer for which I will be applying!

I hope you also apply to the American Mathematical Society's Mathematical Research Community on applied category theory. This is another summer program on applied category theory, which I'm teaching a course in.

Emmy Blumenthal (they/them) (Jan 12 2022 at 20:24):

John Baez said:

This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways...

Ha, that was quick. Is it that you were hoping something had been worked out that you could rather quickly apply?

Yes, I was hoping something had been worked out so that I could apply it to my research. However, I spoke briefly with @Brendan Fong, and he recommended that preparing a literature review on the category Conv/ConvRel could be useful to my own understanding in addition to other researchers/students.

Emmy Blumenthal (they/them) (Jan 12 2022 at 20:26):

I've been looking at that work on Legendre transforms in category theory and am finding that there might be something I can build off of! Thank you so much for all the feedback, and I'll be sure to apply to that as well!

Emmy Blumenthal (they/them) (Jan 12 2022 at 21:40):

As I'm reading and writing some things out about Legendre transform from the categorical POV, but I realize that I don't precisely understand their correspondence to Lagrange multipliers. I am noticing that the general theme is that maximizing ${\cal L} = f(x) + \sum_i \lambda_i g_i(x)$ , the Lagrangian of a (convex) cost function, $f$ (i.e., minimize f(x) w.r.t. x), where $g_i(x) \leq 0$ are the imposed constraints and $\lambda_i$ are the associated Lagrange multipliers, looks just like $\max_{p_i} \sum_i p_i x_i + f(x)$ , but I have yet to find an exact statement discussing their correspondence. However, this general concept of using Legendre transforms to work with concave functions is very familiar from statistical mechanics. Does anyone know where I can find a precise statement of this relationship?

Also, please let me know if I'm getting a sign wrong somewhere.

Thank you so much for any ideas!

John Baez (Jan 12 2022 at 21:56):

Sign issues are always annoying because some people are minimizing convex functions while others are maximizing concave ones, and separately some people might write $+\sum_i \lambda_i g_i(x)$ while others might write $-\sum_i \lambda_i g_i(x)$ : these are separate decisions. But this can be straightened out when you actually need to: then you just carefully write down your own assumptions and do your own calculations. So, I wouldn't worry about it until I need to.

John Baez (Jan 12 2022 at 21:58):

The basic idea is this, I think. You've got a function $f: \mathbb{R}^n \to \mathbb{R}$ that you're trying to maximize subject to a constraint

$g(x) = c$

where $g : \mathbb{R}^n \to \mathbb{R}$ is some other function.

John Baez (Jan 12 2022 at 21:59):

So, you look for critical points of the function

$f + \lambda g$

where $\lambda \in \mathbb{R}$ is some new variable.

John Baez (Jan 12 2022 at 22:01):

Say this just has one critical point, which of course depends on $\lambda$ . This assumption of "one critical point" will hold under some conditions.

John Baez (Jan 12 2022 at 22:03):

(To be continued, and perhaps edited.)