Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: community: our work

Topic: Emily de Oliveira Santos


view this post on Zulip Emily (she/her) (Nov 08 2024 at 22:58):

Today Epoch AI has announced a math benchmark to test AI systems comprised of extremely hard problems in research math.

You can check the paper, the announcement video, or their website for more details, including interviews with Borcherds, Tao, Chen, and Gowers (!!)

Something I thought the people here on Zulip would appreciate to know is that some of the problems in the benchmark were specifically on category theory :)

view this post on Zulip Todd Trimble (Nov 08 2024 at 23:06):

I'm impatient and lazy. I did a ctrl-F on the paper to try to see which problems on category theory those were. I didn't see anything. Can you tell us what the problems were?

view this post on Zulip Kevin Carlson (Nov 08 2024 at 23:14):

I think the benchmark has to be kept mostly secret, unfortunately for our interest in what category-theoretic problems with numerical answers people have got in there!

view this post on Zulip Emily (she/her) (Nov 08 2024 at 23:15):

Todd Trimble said:

I'm impatient and lazy. I did a ctrl-F on the paper to try to see which problems on category theory those were. I didn't see anything. Can you tell us what the problems were?

I wish I could, but since data contamination¹ is an issue, I unfortunately can't go into more details than saying category theory was one of the areas which the benchmark aimed to cover (as mentioned in the paper)

¹i.e. when problems and solutions of a benchmark get leaked and models end up being trained on those, so they end up being able to solve them by merely reproducing what they have seem

view this post on Zulip Kevin Carlson (Nov 08 2024 at 23:17):

Oh, you're a co-author, Emily! So I guess you were involved in constructing the category theory-related problems, then? Very cool.

view this post on Zulip Emily (she/her) (Nov 08 2024 at 23:22):

Kevin Carlson said:

Oh, you're a co-author, Emily! So I guess you were involved in constructing the category theory-related problems, then? Very cool.

Yep! It was super cool coming up with them, and I even ended up figuring a number of things that are going to be super useful in some of the research I've been doing!

view this post on Zulip Emily (she/her) (Nov 08 2024 at 23:24):

There was one particular problem which I ended up not submitting to FrontierMath because I had a certain issue with it, so I ended up submitting it to a different benchmark (this one)

view this post on Zulip Emily (she/her) (Nov 08 2024 at 23:24):

I think there's a good chance I'll be able to share it here once that other benchmark is over

(i.e. if either the problem gets rejected or if it makes into the dataset but not into the private subset of it, which according to the description on their website will be "small")

view this post on Zulip Todd Trimble (Nov 08 2024 at 23:53):

Thank you Emily! Follow-up question: can numerical solutions be infinite sequences of numbers? Or is it that all of the posed questions are single numbers or maybe finite sets of numbers?

view this post on Zulip Emily (she/her) (Nov 08 2024 at 23:59):

Todd Trimble said:

Thank you Emily! Follow-up question: can numerical solutions be infinite sequences of numbers? Or is it that all of the posed questions are single numbers or maybe finite sets of numbers?

I think (judging from the paper) that the allowed kinds of solutions were single numbers, finite sets of numbers, or SymPy objects (so e.g. things like eln(2π)e^{\ln(2\pi)}, matrices, etc.). Most likely infinite sequences or infinite objects in general could not be allowed because it'd make automatically verifying the model's answers too hard =/

view this post on Zulip Emily (she/her) (Nov 08 2024 at 23:59):

(I'm not saying this is the case though, just what I think is probably true)

view this post on Zulip Kevin Carlson (Nov 09 2024 at 00:16):

At least, it seems hard to quantify how strongly you believe that the model is generating the correct infinite stream, unless it gives you a program that produces that stream that you can put in bisimulation with a known-correct one. But that would add a whole lot of trouble!

view this post on Zulip David Michael Roberts (Nov 09 2024 at 02:54):

I saw the paper linked on Mathstodon, and was having a hard time imagining what on earth the CT problems could be. That the answers are so highly constrained makes the whole exercise somehow differently interesting for me. Now I'm even more mystified as to the kinds of problems they could be.

I have been contacted more than once through various channels by various companies wanting me to use my expertise in mathematics to come up with problems for the purposes of AI research — but the kinds of problems they are after are not ones I really consider (IMO-style, mostly).

view this post on Zulip David Michael Roberts (Nov 09 2024 at 02:56):

Obviously we can't know what the specific problems are, but how about giving a general hint of the type of problem? Something like "enumerate the number of non-isomorphic/equivalent groupoids with n morphisms" seems like a problem that falls in the realm of "having a numeric answer", but also feels less like category theory.

view this post on Zulip Emily (she/her) (Nov 09 2024 at 13:18):

I'm a little apprehensive on talking about the problems that made it into the dataset, but likely I'll be able to talk a bit about the problems I submitted to the other benchmark project that had been running during October (this one, from Scale AI) once it's over.

view this post on Zulip Emily (she/her) (Nov 09 2024 at 13:18):

For those, I tried to make sure that a core part of the problem involved non-trivial category theory, and often the problem involved some combinatorics after that.

view this post on Zulip Emily (she/her) (Nov 09 2024 at 13:19):

There was one problem that I was originally going to submit to FrontierMath but ended up not doing so due to a few reasons (in the end I also wasn't completely happy with how it came out).

view this post on Zulip Emily (she/her) (Nov 09 2024 at 13:19):

Hopefully I'll be able to share it here once that other benchmark is over, but for now I can say that the core category theory part of that problem was "Isbell duality" :)

view this post on Zulip Kevin Carlson (Nov 09 2024 at 22:06):

Let me see if I can make up a semi-compelling example since Emily can't share much here. Aiming for a lower level than this dataset, more like a problem you could give in a graduate-level course, how about "How many functors between the following two very small categories are such that left Kan extension along them is full?"

view this post on Zulip Kevin Carlson (Nov 09 2024 at 22:07):

This is just on my mind because I recently had to characterize which functors have full left Kan extensions, and it was quite doable but didn't seem to be findable in the literature.

view this post on Zulip Kevin Carlson (Nov 09 2024 at 22:08):

Now, depending on which categories you chose in my problem the ratio of category theory to combinatorics could be arbitrary. It'd be interesting to try to get more CT and less combinatorics.

view this post on Zulip Emily (she/her) (Nov 11 2024 at 18:55):

Kevin Carlson said:

This is just on my mind because I recently had to characterize which functors have full left Kan extensions, and it was quite doable but didn't seem to be findable in the literature.

This sounds quite interesting! Were you after general conditions for the left Kan extensions to be full, or was the setup somewhat specific?

view this post on Zulip Emily (she/her) (Nov 11 2024 at 18:56):

(I'd love to hear more either way! :)

view this post on Zulip Kevin Carlson (Nov 11 2024 at 20:16):

Ah, I was misremembering; I was looking for conditions on a general ff such that f!f_! has epimorphic unit and such that it preserves monos and convinced by self that necessary (I think?) and sufficient conditions are that ff be full and that ff's comma categories be confluent. This was in the middle of working on a conjecture of Ben Bumpus and some of his collaborators.

view this post on Zulip Kevin Carlson (Nov 11 2024 at 20:18):

This was a fun exercise because I remember it as the first time talking to Claude "helped" me with a real problem, in that it gave me obviously wrong guesses to fix, which was easier than generating wrong guesses for myself.

view this post on Zulip Emily (she/her) (Nov 14 2024 at 18:01):

Kevin Carlson said:

Ah, I was misremembering; I was looking for conditions on a general ff such that f!f_! has epimorphic unit and such that it preserves monos and convinced by self that necessary (I think?) and sufficient conditions are that ff be full and that ff's comma categories be confluent. This was in the middle of working on a conjecture of Ben Bumpus and some of his collaborators.

Oh, that sounds really interesting! Did you end up writing this down? I've been gathering similar such conditions on Clowder¹, and would love to have a look at it!

(¹More specifically, Item (f) of Item 4 of Proposition 8.5.1.1.2 (as of writing), here)

view this post on Zulip Emily (she/her) (Nov 14 2024 at 18:05):

Kevin Carlson said:

This was a fun exercise because I remember it as the first time talking to Claude "helped" me with a real problem, in that it gave me obviously wrong guesses to fix, which was easier than generating wrong guesses for myself.

I've found AI tools have been surprisingly useful lately (at least compared to GPT4, which was pretty much unusable for math when it first came out). A while ago they actually helped me compute a coend which I otherwise would be struggling with for a long time.

view this post on Zulip Emily (she/her) (Nov 14 2024 at 18:06):

I've also found them to be pretty useful for stuff like generating programs to explicitly compute traces of categories, like here (context: MO 482041)