You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Hi everyone,
Currently, I'm working on a project finding niche subsets of larger datasets using some techniques in convex optimization. As part of my research, I have come across the concept of strong duality in convex spaces, but I have yet to find a categorification of these ideas. Has anyone worked on this or knows of any resources where I can read about this? I'm aware of the categories Conv and ConvRel, but I am having a difficult time finding a definitive summary of them. My first guess is that strong duality when categorically formalized may involve a functor from the category of pointed convex spaces, Conv_\star, to the pre/total order category on \R. Formalizing the primal (minimization/maximization) problem in such a fashion would then yield the dual (maximization/minimization) problem when examining the opposite category.
What's "strong duality" for convex spaces? I might know something if you explain it, but I've never heard of it.
John Baez said:
What's "strong duality" for convex spaces? I might know something if you explain it, but I've never heard of it.
Doing a google search, I found these notes:
https://people.eecs.berkeley.edu/~elghaoui/Teaching/EE227A/lecture8.pdf
A segment that seems relevant:
strong duality
A bit more about duality, in a way I find more intuitive (from "Sparse Convex Optimization Methods for Machine Learning"):
duality
The idea described here is to approximate a convex function at each point by a supporting hyperplane. Then you can try to minimize this approximation over some domain of interest.
(I have used some of this stuff when working on reconstructing images, where the image reconstruction process can be viewed as optimizing a convex function.)
(Also, this is a great resource for convex optimization: https://web.stanford.edu/~boyd/cvxbook/ )
David Egolf said:
(Also, this is a great resource for convex optimization: https://web.stanford.edu/~boyd/cvxbook/ )
This is exactly the primary book I'm referencing. I think they develop the concept especially well and give some intuition.
My brief summary is that when certain hypotheses (feasibility is sufficient in least squares, quadratic, and linear programming) in convex and affine spaces hold for a convex objective function which we wish to minimize (the primal problem), we can write out the the dual problem Lagrangian. Maximizing the dual Lagrangian w.r.t. the Lagrange/KKT multipliers then becomes equivalent to the minimization of the primal objective function; when this occurs, we have strong duality, and sup Lagrangian = inf objective.
My research has a strong emphasis on the role of Lagrange multipliers as first-order approximations for a sort of generalized constraint force (e.g.,$ \beta = 1/(k_B T)$ in thermodynamics), so ideally any categorical P.O.V. will develop the role of Lagrange multipliers. I tried looking at @John Baez 's new paper "Compositional Thermostatics" to see if I could extract any intuition from how temperature is described, but I don't think I've explored enough to know if I can find any connections; however, I suspect there is something there as, in thermostatics, we are trying to maximize entropy---a concave function---of a distribution (a convex space) subject to the constraint of constant energy. The notion that the dual problem estimates constraints with an additional linear term in the Lagrangian is equivalent to the notion that we can approximate that heat exchange between the system and the bath is insignificant compared to the bath (the thermodynamic limit). This last paragraph is definitely a little conjecture-y, but I was inspired by the thermostatics paper to start thinking a bit about these ideas.
Thank for any help!
Thanks! Certainly Lagrange multipliers and the Legendre transform of convex (or concave) functions are fundamental in thermodynamics, and my paper with @Owen Lynch on compositional thermostatics is a kind of warmup for studying those things in a modern way... but we don't actually say anything about them in our paper!
So there's more to be done.
Is that to say that convex optimization hasn't been developed much in category theory or specifically in the context of thermodynamics? Do you know if anyone is working on related topics now? Thanks!
A large chunk of thermodynamics and statistical mechanics basically is convex optimization.
So if that's the connection you're trying to make, you can just read about thermodynamics, stat mech and convex optimization and see how they fit together.
I was saying that modern math - the category of convex sets, operads, and such - is just starting to be applied, to clean up these ideas and make them more powerful.
I see. I'm trying to find maximally-informative subsets of large datasets by using the MacArthur Consumer Resource model (see this paper https://arxiv.org/abs/1809.04221). Each data point becomes a consumer in an ecosystem which competes with other data points for 'resources'. If all the consumers are aiming to maximize consumption (framed as a least squares problem) subject to the constraint that there are a finite amount of resources and data points can only consume certain resources at certain rates, some consumers will go extinct like a species pushed out of a niche in an ecosystem. This may be expressed by asking whether the Lagrange multiplier associated with a constraint is non-zero; if it is, the data points 'survives'. So far, we've implemented numerical experiments which have shown that training several different machine learning algorithms on these niche subsets of the MNIST training data produces higher accuracy on the test data compared to training on a randomly selected subset of the training data of the same size as the niche subset. I am hoping to find some category theory somewhere in this project, and I thought I might look to see if Lagrange multipliers are discussed at all in category theory, but I haven't found anything too elucidating yet,
Simon Willerton offered a categorical perspective on the Legendre-Fenchel transform. It's not a _categorified_ version, but maybe it helps anyway. Here are a preprint and a video.
The Legendre-Fenchel dual of a function is what Boyd/Vandenberghe call the conjugate function. It is closely related to the dual optimization problem, see Boyd/Vandenberghe, Section 5.1.6.
Your idea sounds cool!
I tried for a short time to formulate linear programming duality as a duality in the sense of category theory. I think it's likely that it's possible and a good idea, but it's definitely a research question and not a "learning" question
Side comment: Linear programming is arguably the method that the world runs on, I can't overstate the extent to which it's used behind the scenes by supply chains. If, hypothetically, methods coming from category theory could be used to make linear programming say 1% more efficient computationally, it would probably outweigh everything else we could do combined...
As well as the stuff on Legendre-Fenchel transform that Peter mentions above, I've been thinking about optimization in the context of enriched category theory more generally. As Jules said, this is a serious research project, and I'm making the occasional foray into it. One thing I have been thinking out loud at the n-Café about is the specific optimization problem of optimal transport.
In this specific problem of optimal transport, I think there's deeper connections wtih category theory that I've not yet discussed there related to Kantorovich duality and probability monads. (Paolo Perrone and Tobias Fritz have certainly been thinking about things there.) Convex spaces and concave have categorical descriptions as well in terms of probability or convexity monads. So I think this is a ripe area.
With regard to strong duality that was asked about. One view of this is that weak duality is given by the max-min inequality for a Lagrangian and strong duality is when that is actually an equality. Here are my vague categorical thoughts on that in Part II.
Viewed in this way, the max-min inequality
is a manifestation of the canonical morphism that exists between expressions with a limit and colimit interchanged: for a functor , there is a canonical morphism
see the nLab. Similarly the minimax theorem holding is perhaps to be thought of as a statement about certain colimits and limits commuting.
Me and @Dan Marsden categorified convexity for language a bunch of years ago: https://arxiv.org/pdf/1703.01204.pdf
Probably tangential to what you are looking for, but maybe interesting?
Thank you so much for sending these resources! I'll be sure to read through them and would love to stay up-to-date on any research in this area. I would love to be involved in this research, but I fear that I might not have the skills/experience/knowledge to work on these issues as I'm only an undergraduate student. This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways; hopefully, I'll revisit this later in my career and will be able to introduce more category-theoretic concepts.
I try to visit the n-category cafe pretty often, so hopefully I'll see much more on optimization there sometime in the future!
Always great to see an undergrad looking ahead to things they would like to research involving category theory :grinning_face_with_smiling_eyes:
Thanks :) Hopefully I'll see some of you at the Adjoint School this summer for which I will be applying!
but I fear that I might not have the skills/experience/knowledge to work on these issues as I'm only an undergraduate student.
That just means that you're younger and smarter than other people working on this subject.
This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways...
Ha, that was quick. Is it that you were hoping something had been worked out that you could rather quickly apply?
Anyway, that's fine.
The conversation has been useful to me already, since I just realized from what Simon said that this "strong duality" business, which I'd never heard of, has the minimax theorem in game theory as a special case.
I occasionally teach game theory to undergrads, leading up to the proof of the minimax theorem for 2-person games, which I'm sure must seem rather nightmarishly complicated to many of them.
In this proof, the "easy part" is proving the inequality
which you can do just by fiddling around, whereas the "hard part" is proving the reverse inequality.
And as Simon had pointed out to me earlier, the easy part is just a general fact about limits and colimits. The hard part is something more special.
So it's interesting that this is something that people think about and call "strong duality".
I am hoping to find some category theory somewhere in this project, and I thought I might look to see if Lagrange multipliers are discussed at all in category theory, but I haven't found anything too elucidating yet.
I believe Simon's stuff about Legendre transforms is secretly the stuff about Lagrange multipliers that you're seeking.
At least, that's true if the functions you're trying to maximize subject to constraints are concave, or the functions you're trying to minimize subject to constraints are concave: this is the case where the method of Lagrange multipliers becomes something called the Legendre transform.
Hopefully I'll see some of you at the Adjoint School this summer for which I will be applying!
I hope you also apply to the American Mathematical Society's Mathematical Research Community on applied category theory. This is another summer program on applied category theory, which I'm teaching a course in.
John Baez said:
This thread has also indicated to me that I should put off introducing category theory to my current research project as it may significantly extend this project in unexpected ways...
Ha, that was quick. Is it that you were hoping something had been worked out that you could rather quickly apply?
Yes, I was hoping something had been worked out so that I could apply it to my research. However, I spoke briefly with @Brendan Fong, and he recommended that preparing a literature review on the category Conv/ConvRel could be useful to my own understanding in addition to other researchers/students.
I've been looking at that work on Legendre transforms in category theory and am finding that there might be something I can build off of! Thank you so much for all the feedback, and I'll be sure to apply to that as well!
As I'm reading and writing some things out about Legendre transform from the categorical POV, but I realize that I don't precisely understand their correspondence to Lagrange multipliers. I am noticing that the general theme is that maximizing , the Lagrangian of a (convex) cost function, (i.e., minimize f(x) w.r.t. x), where are the imposed constraints and are the associated Lagrange multipliers, looks just like , but I have yet to find an exact statement discussing their correspondence. However, this general concept of using Legendre transforms to work with concave functions is very familiar from statistical mechanics. Does anyone know where I can find a precise statement of this relationship?
Also, please let me know if I'm getting a sign wrong somewhere.
Thank you so much for any ideas!
Sign issues are always annoying because some people are minimizing convex functions while others are maximizing concave ones, and separately some people might write while others might write : these are separate decisions. But this can be straightened out when you actually need to: then you just carefully write down your own assumptions and do your own calculations. So, I wouldn't worry about it until I need to.
The basic idea is this, I think. You've got a function that you're trying to maximize subject to a constraint
where is some other function.
So, you look for critical points of the function
where is some new variable.
Say this just has one critical point, which of course depends on . This assumption of "one critical point" will hold under some conditions.
(To be continued, and perhaps edited.)