Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: learning: questions

Topic: combining optimization problems


view this post on Zulip David Egolf (Jan 12 2022 at 23:38):

This is a broad question, inspired in part by recent discussion in #learning: questions > convex analysis.

I have the impression that category theory is good at talking about piecing smaller things together to make bigger ones. Is anyone here aware of work that has used category theory to do this with optimization problems? The idea would be to study how sub-optimization problems over smaller domains of optimization relate to the full problem.

As an example relevant to imaging, assume we wish to form an image by minimizing an objective function: x^=argminxD(Φxy)\hat{x} = \textrm{arg} \min_{x \in D} \ell(\Phi x - y), where Φ\Phi is some forward model, each point in DD corresponds to a different estimate for the unknown object we are imaging, yy is the observed data from the actual object, and \ell is some loss function. Sometimes this optimization problem is too computationally intensive to solve in a practice way.

One approach to speed this up could be to optimize different parts of the image in parallel. That might look like considering a collection of smaller optimization problems in parallel: x^i=argminxDi(Φxy)\hat{x}_i = \textrm{arg} \min_{x \in D_i} \ell(\Phi x - y). One can then obtain an estimate for the overall unknown object by combining these estimates for the different parts, potentially with some additional modifications.

So, in this case, we want to talk about breaking a larger optimization problem into smaller ones, and comparing the results of the larger optimization problem to the combined results of the smaller ones.

Have people been working on describing this sort of thing from the perspective of category theory?

view this post on Zulip John Baez (Jan 13 2022 at 02:00):

There are lots of optimization problems that aren't compositional, which is why life is difficult.

However, if you have an electric circuit made of wires and resistors, with current flowing through it in a time-independent way, the flow of current will minimize the power consumption (the product of voltage and current for each resistor, summed over resistors). And this type of optimization problem is compositional. Problems of this type show up in many branches of engineering.

Studying this led Brendan Fong to the theory of decorated cospans, which is a very general approach to compositionality. You can get an introduction to it in this blog article:

which includes a chart listing various branches of engineering where this math is relevant.

view this post on Zulip David Egolf (Jan 13 2022 at 02:13):

I guess I am also interested in the case where optimization problems are almost compositional. That is, we can approximate the solution of the whole problem by solving a bunch of subproblems. But I don't know if category theory is the tool to talk about that.

view this post on Zulip Emmy Blumenthal (they/them) (Jan 13 2022 at 02:31):

Another related question is, how can we, in general, write optimization problems in category-theoretic language (and thus explore any compositional aspects of stating these problems)?

view this post on Zulip David Egolf (Jan 13 2022 at 02:31):

A related thought. Imagine we have some map F:ABF: A \to B on objects and morphisms between categories AA and BB. It could be that F(fg)F(f)F(g)F(f \circ g) \neq F(f) \circ F(g). So FF is not a functor.

However, imagine we have some notion of nearness in BB, and we use that to define a new category BB', where we identify morphisms in BB that are sufficiently near. Then perhaps we could recover a functor F:ABF': A \to B'.

Depending on the situation, this identification of "near" things wouldn't have to collapse everything in BB, I think. For example, imagine sending points in some data cloud to their nearest cluster. The result still has some structure.

view this post on Zulip John Baez (Jan 13 2022 at 02:46):

Emmy Blumenthal (they/them) said:

Another related question is, how can we, in general, write optimization problems in category-theoretic language (and thus explore any compositional aspects of stating these problems)?

I believe the practical way to attack this is to first tackle specific classes of optimization problems where it's pretty easy to see how to use categories. (The work I mentioned does this for one class.) Then, after enough work of this sort has been done, one can systematize and generalize it.

Plunging immediately into a high level of generality tends to be too hard - it's like trying to run through a brick wall.

view this post on Zulip John Baez (Jan 13 2022 at 02:57):

David Egolf said:

A related thought. Imagine we have some map F:ABF: A \to B on objects and morphisms between categories AA and BB. It could be that F(fg)F(f)F(g)F(f \circ g) \neq F(f) \circ F(g). So FF is not a functor.

One common way to study this situation is to work with poset-enriched categories so you can meaningfully say

F(f)F(g)F(fg) F(f) \circ F(g) \le F(f \circ g)

This can mean "solving the problems separately and combining the answers may not give the correct answer to the combined problem, but it gives a lower bound on the correct answer." More generally you can work with 2-categories and say there's a 2-morphism

F(f)F(g)F(fg) F(f) \circ F(g) \Rightarrow F(f \circ g)

To learn more about this the buzzword is "lax 2-functor". I think earlier we've talked about other related forms of laxness which are used to describe how a problem fails to be perfectly compositional.

Jade Master and I used laxness to study deviations from compositionality for the "reachability semantics for Petri nets" here:

It sounds fancy but it's just about moving pebbles around between boxes.

view this post on Zulip John Baez (Jan 13 2022 at 03:02):

It sounds like you might also be interested in metagories:

"Metagory" stands for "metric approximate category": they're things where you can talk about how close a morphism comes to being the composite of two other morphisms.

I think blending category theory with notions of approximation may someday become very powerful, but I don't know much work on it yet. I'm probably ignorant of some, but still, someday there should be much more. It will take years of dedicated work... not by me, since I'm too busy with other things!

view this post on Zulip David Egolf (Jan 13 2022 at 03:46):

Thanks, @John Baez, that's a lot of interesting stuff to think over and look in to.

view this post on Zulip Jules Hedges (Jan 13 2022 at 10:05):

I have a lot of unpublished notes along these lines. A couple of years ago I constructed a category using the decorated corelations construction (https://arxiv.org/abs/1703.09888) where objects are finite sets and morphisms XYX \to Y are (not quite, I'm handwaving some technicalities here) functions RX+YR\mathbb R^{|X| + |Y|} \to \mathbb R satisfying some fixed condition to make optimisation work (I was using smooth + convex, but other options are possible). That is to say, objects are "variables" and morphisms are optimisation problems on those variables. When you compose some of the variables vanish, and the composition law is pointwise addition of the objective functions plus maximisation of the variables on the composed boundary: : (fg)(x,z)=maxyf(x,y)+g(y,z)(fg)(x, z) = \max_y f (x, y) + g (y, z). This totally works and you get a category with some nice properties, including a graphical representation of Lagrange multiplies. But I gave up on it because the operation of solving an optimisation problem fails extremely badly to respect the composition law. In general the solution of fgfg can be more or less completely unrelated to the solutions of ff and gg

view this post on Zulip Jules Hedges (Jan 13 2022 at 10:09):

Specifically, if C\mathcal C is this category then argmax\arg\max defines a lax functor F:CRelF : \mathcal C \to \mathbf{Rel}, that is, F(f)F(g)F(fg)F(f) F (g) \subseteq F(fg). Unwrapping, this means that if there is some yy such that (x,y)(x, y) is an optimum of ff and (y,z)(y, z) is an optimum of gg then (x,z)(x, z) is an optimum of fgfg. But in practice there is nearly always no such yy - optimisation problems usually have no solution in common - making that condition vacuous and completely useless for actually solving anything

view this post on Zulip Jules Hedges (Jan 13 2022 at 10:10):

If anyone thinks it's less useless than I thought, I could still finish that paper...

view this post on Zulip Jules Hedges (Jan 13 2022 at 10:11):

Incidentally, the failure of this project is what first suggested to me my grand challenge of understanding laxators more deeply

view this post on Zulip Jules Hedges (Jan 13 2022 at 10:14):

I think that using my framework of contexts (https://obsoletewallstreet.files.wordpress.com/2020/05/categorical-systems-theory-3.pdf) I can ""fix"" this by defining the context for a term in an objective function to be a "loss function" that summarises the effect of all the other terms on it. Again I think this would be useless in practice, because it ends up basically making every term into a copy of the entire objective function

view this post on Zulip Nathaniel Virgo (Jan 13 2022 at 11:01):

This totally works and you get a category with some nice properties, including a graphical representation of Lagrange multiplies.

Ooh, how does that work?

view this post on Zulip Jules Hedges (Jan 13 2022 at 11:34):

Here's the relevant part of my notes: Screenshot-2022-01-13-at-11.34.05.png Screenshot-2022-01-13-at-11.34.14.png

view this post on Zulip ww (Jan 13 2022 at 13:20):

John Baez said:

> One common way to study this situation is to work with poset-enriched
> categories ... more generally you can work with 2-categories and say
> there's a 2-morphism

Aha! This is very close to what I was going about with numerical integration schemes making diagrams that "almost commute". Much better like this. The original question seems closely related as well because you can construct (some) numerical optimisation methods in the same way as you can construct (some) numerical integration methods by taking a fixed point.

view this post on Zulip David Egolf (Jan 13 2022 at 16:43):

Jules Hedges said:

Specifically, if C\mathcal C is this category then argmax\arg\max defines a lax functor F:CRelF : \mathcal C \to \mathbf{Rel}, that is, F(f)F(g)F(fg)F(f) F (g) \subseteq F(fg). Unwrapping, this means that if there is some yy such that (x,y)(x, y) is an optimum of ff and (y,z)(y, z) is an optimum of gg then (x,z)(x, z) is an optimum of fgfg. But in practice there is nearly always no such yy - optimisation problems usually have no solution in common - making that condition vacuous and completely useless for actually solving anything

For image reconstruction, I am interested in the problem of reconstructing different patches of an image in parallel. We can consider each patch as an optimization problem. Let us say we have two patches, with corresponding optimization problems ff and gg. The overall optimal solution will correspond to an image that (of course) only takes on a single value in the overlap of the patches. However, as you note, isolating a patch ff in isolation will generally result in a sub-image that does not agree on the overlap with the sub-image obtained by optimizing patch gg in isolation.
One idea I had, which I have not yet implemented for my specific application, is to optimize different patches in only a quasi-parallel fashion, where they can pass some information back and forth. The idea is to optimize sub-images while requiring agreement on their overlaps, which could still result in significant computational savings if the overlaps are small.
Considering how to categorify this situation - of optimization algorithms required to agree on overlaps - might be interesting.

view this post on Zulip Kyle Wilkinson (Sep 08 2023 at 17:32):

Jules Hedges said:

I have a lot of unpublished notes along these lines. A couple of years ago I constructed a category using the decorated corelations construction (https://arxiv.org/abs/1703.09888) where objects are finite sets and morphisms XYX \to Y are (not quite, I'm handwaving some technicalities here) functions RX+YR\mathbb R^{|X| + |Y|} \to \mathbb R satisfying some fixed condition to make optimisation work (I was using smooth + convex, but other options are possible). That is to say, objects are "variables" and morphisms are optimisation problems on those variables. When you compose some of the variables vanish, and the composition law is pointwise addition of the objective functions plus maximisation of the variables on the composed boundary: : (fg)(x,z)=maxyf(x,y)+g(y,z)(fg)(x, z) = \max_y f (x, y) + g (y, z). This totally works and you get a category with some nice properties, including a graphical representation of Lagrange multiplies. But I gave up on it because the operation of solving an optimisation problem fails extremely badly to respect the composition law. In general the solution of fgfg can be more or less completely unrelated to the solutions of ff and gg

This aligns with Rockafellar's convex bifunctions which he used to create an algebra of convex optimization problems. Rockafellar essentially proved that his bifunctions form a category in his Convex Analysis text (he shows composition, associativity, unitality, among many other algebraic features). However, he never labeled it as a category. Perhaps categories just weren't widely-known enough back in the late 1960s within the analysis field.

There were two talks at ACT2023 surrounding the category of convex bifunctions, so it is gaining some recognition.

One of the ACT2023 speakers highlighted a para construction on this category to model parameters for a sequential control problem. So, there may be some useful applications. I myself am investigating a different construction and different application at the moment.

view this post on Zulip Steve Huntsman (Sep 08 2023 at 18:09):

Kyle Wilkinson said:

This aligns with Rockafellar's convex bifunctions which he used to create an algebra of convex optimization problems. Rockafellar essentially proved that his bifunctions form a category in his Convex Analysis text (he shows composition, associativity, unitality, among many other algebraic features). However, he never labeled it as a category. Perhaps categories just weren't widely-known enough back in the late 1960s within the analysis field.

There were two talks at ACT2023 surrounding the category of convex bifunctions, so it is gaining some recognition.

One of the ACT2023 speakers highlighted a para construction on this category to model parameters for a sequential control problem. So, there may be some useful applications. I myself am investigating a different construction and different application at the moment.

This reminds me of some interesting applied applied work in network characterization: https://dl.acm.org/doi/abs/10.1145/3084453, wherein doing some algebraic stuff that smells very CTish practically corresponds quite closely to the results of optimizations that are themselves fairly structured. Some years ago I had thought there might be a way do arrive at something similar from CT de novo and embarrassingly stumbled upon this afterwards.

view this post on Zulip Kyle Wilkinson (Sep 08 2023 at 20:09):

Steve Huntsman said:

This reminds me of some interesting applied applied work in network characterization: https://dl.acm.org/doi/abs/10.1145/3084453, wherein doing some algebraic stuff that smells very CTish practically corresponds quite closely to the results of optimizations that are themselves fairly structured. Some years ago I had thought there might be a way do arrive at something similar from CT de novo and embarrassingly stumbled upon this afterwards.

It is difficult to read stuff in other fields... computer network terms everywhere here!

It makes sense though that CT-in-optimization benefits would be found in heuristic improvements, which I think is what this paper is discussing: more efficiently finding heuristic bounds on network delays using a min-plus algebra instead of an LP formulation. I won't pretend to understand the algebraic source of their algorithm's efficiency after a first read though.