Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: community: general

Topic: Emily Riehl, proof validation of AI-generated math

Peva Blanchard (Nov 11 2025 at 13:04):

I just watched Emily Riehl's conference.

She is discussing, among other things, ways to protect mathematical practice against the risk of DOS (denial of service) due to AI slope (my formulation). The thread AI-generated papers illustrates this issue.

She suggests a norm for AI-generated proofs
image.png

Any artificially generated mathematical text will not be considered as a proof unless:

it has been communicated in both a natural language text paired with a formalization of all definitions, theorems and proofs.

the formalization has been accepted by the proof assistant and human expert referees have vetted both the formlaization and the paired text.

I don't have any question besides sharing her point (I like the idea), and maybe triggering a discussion.

Peva Blanchard (Nov 11 2025 at 13:06):

I assume the human-generated proofs will not be subject to this standard. So one practical challenge I see is deciding whether a mathematical text has been artificially generated.

Matteo Capucci (he/him) (Nov 12 2025 at 11:30):

This is slightly stronger than the conventional standard for human-generated math because for humans there's all sorts of other contextual heuristics that complement the text of a proof. For starters, we can relate to human thinking and thus trust what we read more, because we assume that what looks generated by a certain thought process has been generated as so, but most importantly (2) human mathematicians go through a socialization of their mathematical production which machines currently don't have.

What do I mean by socialization? First of all, they get an education, which means they have been taught standards and trained in the subject matter style. A mathematician trained in an official institution is trusted because we trust the people who trained them, and the overall system of mathematical production. Secondly, human mathematicians stake their reputation within a community in their publications. A machine does not have either of these, and their throughput is too high to cut them slack by checking their output manually every time, IMO. Hence Riehl's mechanisation requirement is very reasonable.

Matteo Capucci (he/him) (Nov 12 2025 at 11:33):

Let me add: the shortcomings of machines, skills notwithstanding, are basically the same as in every other creative field: lacking a life beyond the task they are accomplishing at any given time, they are not steered in their production by all those contextual issues which are guarantees (and sometimes weights) for humans.
Dually, many of the biases and inclusivity issues in science stem from these unspoken heuristics we value so much when judging other people's work!

Matteo Capucci (he/him) (Nov 12 2025 at 11:35):

Peva Blanchard said:

I assume the human-generated proofs will not be subject to this standard. So one practical challenge I see is deciding whether a mathematical text has been artificially generated.

I believe that since LLMs are making it much easier to formalize math, and given formalization is now at the edge of mainstream mathematics, it won't be too long since formalization of one's results will be considered a gold standard for publication, if not a necessary requirement.

John Baez (Nov 12 2025 at 12:49):

I don't plan to spend time formalizing any results. Luckily I'm retired so I don't need to publish anymore. Thus, I'm safe even in the dystopian scenario where formalizing one's results becomes required for an academia, and mathematicians who don't want to spend time on that are forced out. The young ones may become "indie" mathematicians who earn their living as baristas.

James Deikun (Nov 12 2025 at 13:10):

I think it won't probably become the standard unless it gets much easier than it has been. (OTOH, funding being what it is now, a lot more mathematicians will probably be making livings as baristas anyway.)

Paolo Perrone (Nov 12 2025 at 13:24):

From theorems to coffee, for a change.

Morgan Rogers (he/him) (Nov 12 2025 at 18:06):

John Baez said:

I don't plan to spend time formalizing any results.

If it comes to that, I too may take the espresso train out of academia :silence:

Kevin Carlson (Nov 12 2025 at 23:10):

I think this is a good suggestion from Riehl, but as Matteo’s thoughts themselves suggest, I fear that formalizability will soon become the “easy part” of verifying machine-generated mathematics. What is much harder to verify than correctness is importance, something which LLMs seem to have a deep-seated tendency to wildly overstate. For any paper that’s not simply the proof of an already-formalized theorem which people already really want, which would seem to me to be a very small slice of papers (and even then as the ABC saga shows, explaining the novel concepts introduced to support the proof is a heavy task) one has to answer the “why do I care?” question before the “is it correct?” question even becomes relevant. At this point, I honestly think it’s ontologically impossible for the machine on its own to provide justification for why I should care. A human mathematician collaborating with a machine for constructing definitions and proofs will have to take responsibility for their paper’s argument for its significance just as fully as they do for papers today. I think this pretty well implies that you shouldn’t even try to let a machine draft those parts of your paper.

Peva Blanchard (Nov 12 2025 at 23:18):

I was about to ask: what is it that we want to shield from AI-generated math? Rigor (is it correct) is one necessary thing, but relevance (why do I care) is a more important one.

To be fair, Emily Riehl is not saying that formal definitions, theorems and proofs are enough. They should be paired with a natural language text, and the whole package has to be vetted by human referees. So I think she includes "relevance" in her test.

Matteo Capucci (he/him) (Nov 13 2025 at 13:56):

These are excellent points! This is such a fundamental part of mathematics as a social endeavour that we give it for granted (or, at least, I do) that the AI role in mathematical production will be mostly as a task-solver rather than a research producer.

John Baez (Nov 13 2025 at 13:56):

Indeed, I expect that if formalized proofs become highly valued (e.g. by administrators at universities, or journal editors), people will generate large number of rigorous proofs of boring results, and also results that nobody understands (which can be found by searching for theorems).

Always remember Goodhart's law and plan ahead to take it into account.

Matteo Capucci (he/him) (Nov 13 2025 at 13:57):

Morgan Rogers (he/him) said:

John Baez said:

I don't plan to spend time formalizing any results.

If it comes to that, I too may take the espresso train out of academia :silence:

I don't get this strong repulsion toward formalization :sweat_smile: genuine question: would someone be willing to elaborate?

Matteo Capucci (he/him) (Nov 13 2025 at 13:58):

John Baez said:

Indeed, I expect that if formalized proofs become highly valued (e.g. by administrators at universities, or journal editors), people will generate large number of rigorous proofs of boring results, and also results that nobody understands (which can be found by searching for theorems).

This does not seem much different from the current state of the matter XD

John Baez (Nov 13 2025 at 14:05):

Imagine how the current state of affairs but amplified immensely, when the generation of boring math papers is no longer limited by the ability of math professors to type each word.

AI slop is starting to shows us this depressing future. But we can also get AIs to generate lots of rigorously verified proofs of complete crap.

John Baez (Nov 13 2025 at 14:13):

Matteo Capucci (he/him) said:

I don't get this strong repulsion toward formalization :sweat_smile: genuine question: would someone be willing to elaborate?

Sure! I didn't go into math to spend time mucking around with computers. I want to learn things, understand things and explain things. I want to grapple with the deepest mysteries of the universe! Given this, I find all direct contact with computer syntax both dull and annoying. It doesn't help me understand things: it makes me spend my time thinking about things like matching parentheses.

I also find the energy put into formalization to be largely wasted - at least as far as I'm concerned. Some people seem to think it's important to be 100.0000000% sure that every proof is correct. But I don't really care much if some math results have mistakes in them. Important mistakes tend to get caught eventually. And mistakes in math are far less likely and also far less important than mistakes in other aspects of life. It's far more likely that I'll trip, fall down the stairs and die than that anything serious will happen due to a mistake in a proof of mine!

Indeed, the whole world is going to hell with the climate crisis and the rise of autocrats. A mistake in a proof is way down my list of things to worry about, and I'm not going to waste my time fiddling around with software to reduce the chance of that.

Mike Shulman (Nov 13 2025 at 18:40):

John Baez said:

I didn't go into math to spend time mucking around with computers. I want to learn things, understand things and explain things. I want to grapple with the deepest mysteries of the universe! Given this, I find all direct contact with computer syntax both dull and annoying.

This is one place where I personally believe that AI has the potential to actually be helpful in mathematics: by providing a natural language gateway to proof formalization. I share everyone's skepticism about the ability of AI to find interesting mathematics on its own, even if it could generate a formalized proof of the correctness of what it found. But what about an AI that can read a proof written by a human in mathematical English -- or, even better, a high-level sketch of such a proof -- and generate a formalized version of that proof? And it wouldn't have to be a one-way process, either: if you try writing something and the AI fails to formalize it, it could gloss the proof assistant's error messages back to natural language to try to give you some idea of what went wrong and how to fix it.

David Corfield (Nov 13 2025 at 18:47):

Kevin Carlson said:

one has to answer the “why do I care?” question before the “is it correct?” question even becomes relevant.

"Ce qui limite le vrai, ce n’est pas le faux, c’est l’insignifiant." (René Thom)

Mike Shulman (Nov 13 2025 at 18:53):

I pontificated on this at a bit more length in my recent bulletin article.

David Michael Roberts (Nov 13 2025 at 21:13):

John Baez said:

anything serious will happen due to a mistake in a proof of mine!

Well, it might have delayed a research program by a postdoc for a couple of years.... But on the grand scale of things, that's not like a bridge falling down or a plane crashing :-)

Alex Kreitzberg (Nov 13 2025 at 22:01):

Folks intuition on certainty here are interesting to me. "Certainty" for programmed artifact allows huge compositions where nothing "goes wrong". Without "certainty", you're doomed to "fuss with details" when building machines, because when the small probability of something breaking happens - you have to find it in the composition.

The lack of certainty are why the AI's are so annoying, they're very impressive but you have to fuss with details to use them.

The human capacity to be small mistake tolerant seems, to me, to be one of our most deeply mysterious gifts. If we understood how a human could tell "an idea" was important before they could explain it in detail, I think our conversations around our machines would look very different.

John Baez (Nov 13 2025 at 22:16):

The mistake tolerance of humans grows out of billions of years of history, where somehow matter has managed to become "alive", and a small thing like a beetle or a bacterium can survive an incredible variety of difficult situations. So far machines are very brittle by comparison. The virtues of computers - for example, their ability to carry out the same sequence of steps in a completely reproducible way - are the flip side of their brittleness. Someday we may invent systems that combine the robustness of biology and the precision of machines, but so far we don't know much about the former.

Alex Kreitzberg (Nov 13 2025 at 22:20):

If my point isn't clear, "certainty" should be seen as an extremely useful property a tool can have. It isn't a reason for a tool by itself, in my opinion.

So in the context of papers published in journals, I've assumed the implicit process is to communicate "an idea" as well as enough evidence to suggest the idea has been sufficiently demonstrated. Like everybody has said, I'm not sure just the inclusion of a computer verified proof necessarily makes that easier.

John Baez (Nov 13 2025 at 22:28):

I agree with you on all this. I think right now some mathematicians are in love with the dream of completely formalized mathematics. I'm not quite sure why, since nothing they say convinces me it's worth the hassle. But it's something one can do, with enough work, and mathematicians tend to enjoy a challenge. It also lends itself well to grant proposals, the formation of large teams, and other things typical of "big science".

Alex Kreitzberg (Nov 13 2025 at 22:41):

You've got me wondering how huge "biologically fuzzy" compositions could possibly work. They obviously do, and it's clear to me now I don't know how.

David Michael Roberts (Nov 13 2025 at 22:47):

By way of contrast, here's this from Doron Zeilberger, not exactly shy when it comes to collaborating with computers:

In the future, we would very rarely have the luxury of a fully rigorous proof, and would have to get used to semi-rigorous, and even non-rigorous, experimental and heuristic, proofs. This way mathematica will advance much faster, without the strait-jacket of a "fully rigorous proof".

(it's not in the context of LLMs or AI-generated content, but he's long been a proponent of experimental mathematics being taken as seriously as that with a proof)

David Michael Roberts (Nov 13 2025 at 22:59):

I'm not sure if he really meant 'mathematica' there or if it was a slip of the finger for 'mathematics'

John Baez (Nov 13 2025 at 23:09):

Heh - I'm pretty sure he meant "mathematics".

I haven't figured out Doron's worldview yet, but he seems generally opposed to the traditionally popular theorem-based way way of doing math. He also seems keen to outrage people with strong statements.

David Michael Roberts (Nov 14 2025 at 00:02):

Oh, yes. He's an ultrafinitist, so any proofs about infinitary objects (for instance identities between infinite sums, or other such fun) are for him purely symbolic, and the actual content is the computer calculations either of finitely many interesting quantities, or symbolic manipulation of series. That people don't take such work as seriously as hardcore results that have a long tower of proofs (eg the proof by Deligne of https://en.wikipedia.org/wiki/Ramanujan%E2%80%93Petersson_conjecture for the $\tau$ -function, which used the Weil conjectures) I can imagine grates with him. Also yes, he seems to love being a contrarian, and a bit anti-establishment.