You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
I wrote a paper about computing the algebraic path problem on systems which are left open to their surroundings. I'm eager to get feedback on this...I'm new to computer science so I'm hoping that people here will have a perspective on this that I don't.
A bit about the paper: The algebraic path problem is a generalization of the shortest path problem which covers a lot of ground. For any commutative quantale (i.e. monoidal closed poset with all joins), the algebraic path problem computes the transitive closure of matrices weighted in . Here is a table showing some of the problems you can solve by choosing different quantales:
To make the algebraic path problem open, you can use the cospan formalisms developed by Baez, Courser, and Fong. More specifically, there is a bicategory where morphisms are open -matrices and composition is gluing of these open -matrices. The transitive closure, and therefore the solution to the algebraic path problem on open -matrices, can be cast as a functor from this bicategory. I was very pleased to get the mapping to be functorial with respect to this gluing. In particular, functoriality gives an ismorphism between the solution of the algebraic path problem on a composite system to the solutions on its components. I hope that this isomorphism can inspire better algorithms for computing the algebraic path problem.
I also have a softer question. I find the conference/publishing culture of computer science to be very confusing. I'm wondering if anyone has a suggestion on a good place to submit a paper like this if my goal is to get a job in academic computer science.
its cool
Jade -- very nice! have you seen http://stedolan.net/research/semirings.pdf -- how close is a commutative quantale to a "closed semiring"?
On to the soft question. Here is my worth a grain of salt take. Other people might say "actually papers just like this are published in CS venue X" and they're probably right and I didn't know about X. But as is, it feels like its not yet pitched at a CS audience sufficiently. I note that one basic thing to do would be to take your formula in (3) and actually write out in pseudocode in your language of choice (or real code in haskell or something) how to actually perform this procedure. Otherwise people will say "you've shown me a mathematical result, but I don't actually see a procedure anywhere!"
Writing up things for CS is weird, in my observation, because I think most things have to be explained very explicitly, like to an undergraduate, but there are occasional things that if you explain them too explicitly people will feel "talked down" to, and it is perhaps hard to know which is which, until you see what reviewers yell at you about.
One key element is you need to choose if you want your paper to be "Theory A" or "Theory B": https://blog.computationalcomplexity.org/2003/03/theory-and-theory-b.html
One way I think of this is "Theory B" is more about "how can we describe/classify/translate problems we want to solve" (and sometimes "is this solvable at all") and "Theory A" is more about "ok, given a specification of a problem, how efficiently can we solve it".
The two cultures only have modest overlap, and people with CS-ties on a place like this (including myself) will tend towards Theory B. But most (not all!) CS academic jobs in the U.S. have tended towards Theory A in the past. Nowadays, I think there is also just "systems engineering" stuff which I can't really classify. My description here is a bit rude and ignorant, but it seems to me: "here's a real world problem, here's a smaller version, and we just like wrote a program and ran it a bit and it did some stuff, take a look." Sometimes it's "here's the actual real world problem, and working with BIGCO we actually did write some stuff to solve it, in some sense, maybe, and we can't show you the stuff, because it's proprietary, but we're going to make some claims about it anyway."
Anyway, your result points the way towards something that Theory A might appreciate, but it's not there yet. To make it acceptable at a Theory A venue I'd think you'd want some actual claim where you say "if the input stuff has this structure than we can give these improved complexity bounds" that's very crisp, and that would be the "point". For the more systemsy "Theory A" stuff you instead find some real-world dataset and run your algorithm and some competing ones over it, and show "we don't know exactly why the data has this structure, but in practice we did better." For "Theory B" you could instead say "let's take compositionality as fundamental, and now we prove this procedure works on tiny things, and that when you take pushouts of tiny things it carries through", so now we derive a "correct-by-construction" procedure for big things. Then, ideally, you code-it-up in Agda or something and the proof of the required property of the result is constructed inductively with the result itself, and that's considered of interest.
Other people will I'm sure have other and better ideas as to how one would streamline it into one of those slots.
(lol one place that i thought of where this might fit more as-is is MSCS, but you already know all about it and just published there!)
Gershom said:
Jade -- very nice! have you seen http://stedolan.net/research/semirings.pdf -- how close is a commutative quantale to a "closed semiring"?
On to the soft question. Here is my worth a grain of salt take. Other people might say "actually papers just like this are published in CS venue X" and they're probably right and I didn't know about X. But as is, it feels like its not yet pitched at a CS audience sufficiently. I note that one basic thing to do would be to take your formula in (3) and actually write out in pseudocode in your language of choice (or real code in haskell or something) how to actually perform this procedure. Otherwise people will say "you've shown me a mathematical result, but I don't actually see a procedure anywhere!"
Writing up things for CS is weird, in my observation, because I think most things have to be explained very explicitly, like to an undergraduate, but there are occasional things that if you explain them too explicitly people will feel "talked down" to, and it is perhaps hard to know which is which, until you see what reviewers yell at you about.
One key element is you need to choose if you want your paper to be "Theory A" or "Theory B": https://blog.computationalcomplexity.org/2003/03/theory-and-theory-b.html
One way I think of this is "Theory B" is more about "how can we describe/classify/translate problems we want to solve" (and sometimes "is this solvable at all") and "Theory A" is more about "ok, given a specification of a problem, how efficiently can we solve it".
The two cultures only have modest overlap, and people with CS-ties on a place like this (including myself) will tend towards Theory B. But most (not all!) CS academic jobs in the U.S. have tended towards Theory A in the past. Nowadays, I think there is also just "systems engineering" stuff which I can't really classify. My description here is a bit rude and ignorant, but it seems to me: "here's a real world problem, here's a smaller version, and we just like wrote a program and ran it a bit and it did some stuff, take a look." Sometimes it's "here's the actual real world problem, and working with BIGCO we actually did write some stuff to solve it, in some sense, maybe, and we can't show you the stuff, because it's proprietary, but we're going to make some claims about it anyway."
Anyway, your result points the way towards something that Theory A might appreciate, but it's not there yet. To make it acceptable at a Theory A venue I'd think you'd want some actual claim where you say "if the input stuff has this structure than we can give these improved complexity bounds" that's very crisp, and that would be the "point". For the more systemsy "Theory A" stuff you instead find some real-world dataset and run your algorithm and some competing ones over it, and show "we don't know exactly why the data has this structure, but in practice we did better." For "Theory B" you could instead say "let's take compositionality as fundamental, and now we prove this procedure works on tiny things, and that when you take pushouts of tiny things it carries through", so now we derive a "correct-by-construction" procedure for big things. Then, ideally, you code-it-up in Agda or something and the proof of the required property of the result is constructed inductively with the result itself, and that's considered of interest.
Other people will I'm sure have other and better ideas as to how one would streamline it into one of those slots.
Thanks Gershom. I hadn't seen that pdf but it looks great. A (commutative?) quantale can be turned into a closed semiring with join as it's +. The * operation is given by the free monoid in the preorder of your quantale.
Also what your saying about Theory A and Theory B is very helpful. It wasn't a distinction I was really familiar with and it's definitely something I should know about. Your suggestion and sketch for a more Theory A version of this paper sounds good...I definitely want to see how applied I can go.
I don't think you can become a Theory A person very quick, Jade. I know about this distinction, and everything we do is heavily Theory B (or maybe Theory C if such a thing exists).
I and perhaps many other people find "genuine" Theory A a bit scary and imposing. But it isn't all like NP complexity stuff. Some of it really is closer to e.g. Tarjan's papers on graph algorithms. It's just the style and formalisms used are a different sort of rigorous.
One origin of Jade's paper was that I was trying to come up with a really practical application of the "network models" that @Joe Moeller and I had been coming up with. I was trying to use them to help solve shortest-path problems.
We can think of Jade's idea as a rigorous working-out of the basic idea - though she's not using network models, rather structured cospans.
However, I think the methodology will only become practical when combined with other ideas.
I keep talking about these other ideas (to my gang of grad students), but we haven't really put any time into thinking about them in detail.
One reason is that to pursue them I think we need to really forget about category theory for a while.
You're right that it would be hard to become a Theory A person very quickly. I was more thinking that it would be good to be a Theory B person with more of a slant towards Theory A. I think this would be a good strategy because Theory A is where the funding is apparently.
Also Gershom. I'm not sure I understand your idea about Agda completely. Is the idea that it would be about designing systems rather than analyzing them?
John Baez said:
However, I think the methodology will only become practical when combined with other ideas.
Which other ideas do you mean?
I've said 'em over and over.
1) One is the idea of not trying to get the shortest path, just a "reasonably good" path. That'll let you use approximations to your monad instead of the full-fledged monad.
2) Another is to imagine data that keeps being updated over time. So, you have a big graph written as a composite of smaller open graphs, and the edges are weighted by elements of some commutative quantale... and let's say you already have an approximate solution to the algebraic path problem for the whole big graph... but then someone updates the edge weights on one of the smaller open graphs!
How do you update your solutions for the whole big graph?
Again, we don't want to compute the exact answer... that'll take a long time. But there may be some approximate thing we can do, that's fast.
3) Another is to imagine "black-boxing" one of the smaller open graphs, so all we record about it is the shortest path from each point on its "boundary" to each other point on its "boundary". This may let us keep track of a lot less information, if all we care about is how to get through this region, not move around inside it.
Jade Master said:
You're right that it would be hard to become a Theory A person very quickly. I was more thinking that it would be good to be a Theory B person with more of a slant towards Theory A. I think this would be a good strategy because Theory A is where the funding is apparently.
just move to france :kiss:
John Baez said:
How do you update your solutions for the whole big graph?
ah, incremental computing is a whole Thing :)
@Jesse Sigal showed me a paper about it a little while back which was doing a thing that i noticed was a special case of internal categories
i have some thoughts about that... but maybe ill talk about em sometime elsewhere.
John Baez said:
2) Another is to imagine data that keeps being updated over time. So, you have a big graph written as a composite of smaller open graphs, and the edges are weighted by elements of some commutative quantale... and let's say you already have an approximate solution to the algebraic path problem for the whole big graph... but then someone updates the edge weights on one of the smaller open graphs!
Thanks. I remembered these ideas a bit, I just wasn't sure this is what you meant. I think that this might be related to 2-functoriality of the mapping
. Changes in your components can be thought of as 2-morphisms on the components. Those get mapped to changes in their solutions. The 2-dimensional aspect of functoriality guarantees that you can glue these small changes together to get a change on the whole graph.
Jade Master said:
Also Gershom. I'm not sure I understand your idea about Agda completely. Is the idea that it would be about designing systems rather than analyzing them?
Well you have a functor, right, that sends an R-matrix to an idempotent R matrix that is in some sense (co?)universal with regards to the one you started with and the "algebraic path problem" is the computation of this in any specific case, right? So you want to write a function in agda, x : RMat -> (y : IRMat, proof of co-universality of y w/r/t x ). Now how do you do that? One way is you write a function that works only for very small things, which you can do exhaustively by cases. Then, for a bigger thing, you decompose it into the small things, run it on the small things, and compose the resultant matrixes and proofs back up. This is often how formalized/mechanized proofs of correctness of inductive algorithms work. Your result seems to suggest that this can be done for the shortest paths problem.
so your new algorithm is maybe not better with regards to complexity, but it does come with a certified proof, and that can be useful, and constructions like that are often able to be written up and presented, because they advance our understanding of how to do "old things" in a "nice, new, proof carrying way". Just like sometimes if you write something "old" up with categorical methods it is of interest to some people for that reason alone, because it is more generalized and useful, so too with something like this.
Hope that makes sense?
Jade Master said:
Thanks. I remembered these ideas a bit, I just wasn't sure this is what you meant. I think that this might be related to 2-functoriality of the mapping
.
Changes in your components can be thought of as 2-morphisms on the components. Those get mapped to changes in their solutions. The 2-dimensional aspect of functoriality guarantees that you can glue these small changes together to get a change on the whole graph.
That's great! But we also need to start thinking like Type A personalities, in terms of algorithms and their runtimes. Do you get how Floyd-Warshall works and why it has runtime ? I tend to forget, but Wikipedia explains it really well.
So, we need to think about how, after you update a little "patch", you're going to make these updates propagate across the whole graph, in a very nice efficient way.
The Floyd-Warshall algorithm is not sneaky, but it intelligently avoids unnecessary labor.
I guess we should think about the equation on Wikipedia after "and the recursive case is". This is "the heart of the Floyd-Warshall algorithm", but it's also something about how you compute the free idempotent matrix on a matrix.
Hm. We should read this paper maybe
I prefer to just think....
But yeah, that looks like a fun paper.
I really want you to practice thinking up algorithms. (And me too!!!)
It's not good to get stuck in thinking like a category theorist all the time.
It's an amazing ability - to be able to think like a category theorist - but if you get trapped in doing it all the time it's crippling.
(Maybe I shouldn't be a thesis advisor in public, but oh well. Jade is great; I'm just trying to make her even greater.)
ive been making her learn haskell so that should be helpful for this :halo:
Haha true. And you're right John.
Good! I can't program, so getting someone to teach you programming is great. But I think I might be able to help you dream up some algorithms. I think staring at the "the heart of the Floyd-Warshall algorithm" and really understanding it would be a key step.
You see, we might naively turn a matrix M into an idempotent matrix by computing
where is the length of the longest possible non-self-crossing path in our graph.
If you don't do any sneaky tricks, computing involves "multiplications" when our matrix is .
I'm writing "multiplications" in quotes because we're in a crazy rig where multiplication is addition.
See why we need if we don't do anything clever?
And that's just for computing ! So computing is gonna be a lot more work.
But Floyd-Warshall says: no, you just need steps!
By the way, is about in the worst case scenario.
So anyway, you should think of Floyd-Warschall as a clever way of computing with a lot fewer "multiplications", and you should understand why it works, and you should use the same sort of clever idea in other algorithms.
I predict that there's a category-theoretic "heart" of the Floyd-Warshall algorithm, which is some nice way to understand how it computes so quickly. It's something about what happens when you add one extra point to the vertex set of your graph: it uses this idea recursively.
John Baez said:
I predict that there's a category-theoretic "heart" of the Floyd-Warshall algorithm, which is some nice way to understand how it computes so quickly. It's something about what happens when you add one extra point to the vertex set of your graph: it uses this idea recursively.
I think this adding of an extra point to the vertex set of your graph might be a pushout :thinking:
I know you said you didn't want more category theory whoops.
Right, I don't. Not until after we have an actual idea.
If you look at the algorithm you'll see we've got a graph on a bunch of vertices but we only consider paths from to that uses the vertices (and of course and ). So we're "restricting" the graph to the set .
Call this restricted graph .
So, we assume we know the shortest path from to in the graph , and then - using the "heart" of the algorithm - we find the shortest path from to in the graph .
So when I say "adding an extra point" I mean going from to the slightly larger graph .
But really the main operation we're using is our ability to take a graph and "restrict" it, getting graphs
Right. It kind of reminds me of a filtration...like in persistent homology.
One maybe useful generalization could be to make the graphs in the sequence increase by more than one vertex at a time
Wouldn't that make the computation harder?
The topic seems appropriate to Journal of ACM.
I think you'd want to add an opening [no pun intended] section discussing classes of problems the open formulation might address; and a final section with CS nitty-gritty: algorithm as @Gershom suggested, an example or two, complexity, etc. Some of which you have in the conclusion.
Minor nit: I don't think anyone considers dynamic programming [a la Bellman] "In software engineering"; the point stands without that phrase.
John Baez said:
You see, we might naively turn a matrix M into an idempotent matrix by computing
It might be useful also to compare Floyd-Warshall to a slight refinement of this time- baseline. In particular, a standard trick (c.f. Savitch's Theorem's proof) shows how to turn this into : just iterate the map , starting from , until convergence, which will happen in at most steps!
Thanks! This is the kind of "standard trick" I'm completely unaware of, since I've never really studied algorithms.
@Jade Master this is a great paper, for CS venues, the most directly interested people would be those associated with the GraphBlas community. Here is a standards document about a standard for graph theoretic programming paragdigm centered around semirings and algebraic paths. http://www.mit.edu/~kepner/GraphBLAS/GraphBLAS-Math-release.pdf this community grew out of the high performance computing community as an effort to extend the Basic Linear Algebra Subroutines (BLAS) which support pretty much all HPC and scientific computing workloads (think Matlab and Numpy style programming) to more general purpose graph theoretic computations. That community would expect that your technique produces an accelerated version of FW for arbitrary quantales with an implementation and benchmarking to prove that it is in fact faster.
characterizing the set of problems where this approach works would be interesting from a theoretical perspective and the people in the graph blas community would welcome it, particularly John Gilbert at UCSB, but they are in general about getting algorithms to go faster on bigger systems.
Rich Hilliard said:
The topic seems appropriate to Journal of ACM.
I think you'd want to add an opening [no pun intended] section discussing classes of problems the open formulation might address; and a final section with CS nitty-gritty: algorithm as Gershom suggested, an example or two, complexity, etc. Some of which you have in the conclusion.
Minor nit: I don't think anyone considers dynamic programming [a la Bellman] "In software engineering"; the point stands without that phrase.
Thank you for the advice and your note. It is helpful :smile:
James Fairbanks said:
Jade Master this is a great paper, for CS venues, the most directly interested people would be those associated with the GraphBlas community. Here is a standards document about a standard for graph theoretic programming paragdigm centered around semirings and algebraic paths. http://www.mit.edu/~kepner/GraphBLAS/GraphBLAS-Math-release.pdf this community grew out of the high performance computing community as an effort to extend the Basic Linear Algebra Subroutines (BLAS) which support pretty much all HPC and scientific computing workloads (think Matlab and Numpy style programming) to more general purpose graph theoretic computations. That community would expect that your technique produces an accelerated version of FW for arbitrary quantales with an implementation and benchmarking to prove that it is in fact faster.
Ah thank you for the recommendations. I will try to contact some of these people.
I would like to say that I don't have an algorithm yet, just a (vague) suggestion about how to make one.
When it comes to algorithms, the remark by @Sam Tenka sounds important!
Yes. I am still working on absorbing it.