You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Today Epoch AI has announced a math benchmark to test AI systems comprised of extremely hard problems in research math.
You can check the paper, the announcement video, or their website for more details, including interviews with Borcherds, Tao, Chen, and Gowers (!!)
Something I thought the people here on Zulip would appreciate to know is that some of the problems in the benchmark were specifically on category theory :)
I'm impatient and lazy. I did a ctrl-F on the paper to try to see which problems on category theory those were. I didn't see anything. Can you tell us what the problems were?
I think the benchmark has to be kept mostly secret, unfortunately for our interest in what category-theoretic problems with numerical answers people have got in there!
Todd Trimble said:
I'm impatient and lazy. I did a ctrl-F on the paper to try to see which problems on category theory those were. I didn't see anything. Can you tell us what the problems were?
I wish I could, but since data contamination¹ is an issue, I unfortunately can't go into more details than saying category theory was one of the areas which the benchmark aimed to cover (as mentioned in the paper)
¹i.e. when problems and solutions of a benchmark get leaked and models end up being trained on those, so they end up being able to solve them by merely reproducing what they have seem
Oh, you're a co-author, Emily! So I guess you were involved in constructing the category theory-related problems, then? Very cool.
Kevin Carlson said:
Oh, you're a co-author, Emily! So I guess you were involved in constructing the category theory-related problems, then? Very cool.
Yep! It was super cool coming up with them, and I even ended up figuring a number of things that are going to be super useful in some of the research I've been doing!
There was one particular problem which I ended up not submitting to FrontierMath because I had a certain issue with it, so I ended up submitting it to a different benchmark (this one)
I think there's a good chance I'll be able to share it here once that other benchmark is over
(i.e. if either the problem gets rejected or if it makes into the dataset but not into the private subset of it, which according to the description on their website will be "small")
Thank you Emily! Follow-up question: can numerical solutions be infinite sequences of numbers? Or is it that all of the posed questions are single numbers or maybe finite sets of numbers?
Todd Trimble said:
Thank you Emily! Follow-up question: can numerical solutions be infinite sequences of numbers? Or is it that all of the posed questions are single numbers or maybe finite sets of numbers?
I think (judging from the paper) that the allowed kinds of solutions were single numbers, finite sets of numbers, or SymPy
objects (so e.g. things like , matrices, etc.). Most likely infinite sequences or infinite objects in general could not be allowed because it'd make automatically verifying the model's answers too hard =/
(I'm not saying this is the case though, just what I think is probably true)
At least, it seems hard to quantify how strongly you believe that the model is generating the correct infinite stream, unless it gives you a program that produces that stream that you can put in bisimulation with a known-correct one. But that would add a whole lot of trouble!
I saw the paper linked on Mathstodon, and was having a hard time imagining what on earth the CT problems could be. That the answers are so highly constrained makes the whole exercise somehow differently interesting for me. Now I'm even more mystified as to the kinds of problems they could be.
I have been contacted more than once through various channels by various companies wanting me to use my expertise in mathematics to come up with problems for the purposes of AI research — but the kinds of problems they are after are not ones I really consider (IMO-style, mostly).
Obviously we can't know what the specific problems are, but how about giving a general hint of the type of problem? Something like "enumerate the number of non-isomorphic/equivalent groupoids with n morphisms" seems like a problem that falls in the realm of "having a numeric answer", but also feels less like category theory.
I'm a little apprehensive on talking about the problems that made it into the dataset, but likely I'll be able to talk a bit about the problems I submitted to the other benchmark project that had been running during October (this one, from Scale AI) once it's over.
For those, I tried to make sure that a core part of the problem involved non-trivial category theory, and often the problem involved some combinatorics after that.
There was one problem that I was originally going to submit to FrontierMath but ended up not doing so due to a few reasons (in the end I also wasn't completely happy with how it came out).
Hopefully I'll be able to share it here once that other benchmark is over, but for now I can say that the core category theory part of that problem was "Isbell duality" :)
Let me see if I can make up a semi-compelling example since Emily can't share much here. Aiming for a lower level than this dataset, more like a problem you could give in a graduate-level course, how about "How many functors between the following two very small categories are such that left Kan extension along them is full?"
This is just on my mind because I recently had to characterize which functors have full left Kan extensions, and it was quite doable but didn't seem to be findable in the literature.
Now, depending on which categories you chose in my problem the ratio of category theory to combinatorics could be arbitrary. It'd be interesting to try to get more CT and less combinatorics.
Kevin Carlson said:
This is just on my mind because I recently had to characterize which functors have full left Kan extensions, and it was quite doable but didn't seem to be findable in the literature.
This sounds quite interesting! Were you after general conditions for the left Kan extensions to be full, or was the setup somewhat specific?
(I'd love to hear more either way! :)
Ah, I was misremembering; I was looking for conditions on a general such that has epimorphic unit and such that it preserves monos and convinced by self that necessary (I think?) and sufficient conditions are that be full and that 's comma categories be confluent. This was in the middle of working on a conjecture of Ben Bumpus and some of his collaborators.
This was a fun exercise because I remember it as the first time talking to Claude "helped" me with a real problem, in that it gave me obviously wrong guesses to fix, which was easier than generating wrong guesses for myself.
Kevin Carlson said:
Ah, I was misremembering; I was looking for conditions on a general such that has epimorphic unit and such that it preserves monos and convinced by self that necessary (I think?) and sufficient conditions are that be full and that 's comma categories be confluent. This was in the middle of working on a conjecture of Ben Bumpus and some of his collaborators.
Oh, that sounds really interesting! Did you end up writing this down? I've been gathering similar such conditions on Clowder¹, and would love to have a look at it!
(¹More specifically, Item (f) of Item 4 of Proposition 8.5.1.1.2 (as of writing), here)
Kevin Carlson said:
This was a fun exercise because I remember it as the first time talking to Claude "helped" me with a real problem, in that it gave me obviously wrong guesses to fix, which was easier than generating wrong guesses for myself.
I've found AI tools have been surprisingly useful lately (at least compared to GPT4, which was pretty much unusable for math when it first came out). A while ago they actually helped me compute a coend which I otherwise would be struggling with for a long time.
I've also found them to be pretty useful for stuff like generating programs to explicitly compute traces of categories, like here (context: MO 482041)