The AI "Scientist" · practice: communication

My wife sent me this, a paper heralding "the AI scientist", an ensemble of AI tools allegedly capable of automating the whole research ecosystem. For those on Bluesky, this has been critiqued admirably by Carl T. Bergstrom, but it is a sign that scientific communities need to move fast to avoid being submerged by a tidal wave of AI-generated junk papers. I know from direct experience that these are already showing up in isolation, but the construction of a system for rolling them out is an alarming development.

Since several organizers of conferences and editors of journals are present here, I wanted to launch a discussion of what we can collectively do to manage this trend.

Jean-Baptiste Vienney (Nov 25 2024 at 16:59):

Ok, it doesn’t work well now. But it is not impossible that AI will end up replacing researchers ahah (however ChatGPT says it will not happen tomorrow).

Jean-Baptiste Vienney (Nov 25 2024 at 17:09):

I would have liked to have more details about what Carl T. Bergstrom was thinking about when he wrote: "Given what I know of fundamental limits to what LLMs can do, I see no reason to agree."

Jean-Baptiste Vienney (Nov 25 2024 at 17:10):

Jean-Baptiste Vienney (Nov 25 2024 at 17:12):

I think nobody knows whether there are such fundamental limits: maybe there are, maybe there aren't.

Ryan Wisnesky (Nov 25 2024 at 17:18):

One way to understand LLM limitations and abilities is to actually examine their code. GPT2 is only 400 lines of Haskell (https://github.com/tensor-fusion/GPT-Haskell), and is also available as an excel spreadsheet (with training data) and a giant recursive SQL query. You can see for yourself, if the code is "thinking" or "averaging" (my favorite metaphor) or something else entirely. And then ask and answer math questions about the computational model, such as whether it can compute all functions, etc. I understand many LLM architectures cannot be 'Turing complete' for example, while others are.

Morgan Rogers (he/him) (Nov 25 2024 at 17:30):

The problem is not whether AI could eventually produce publication-worthy papers (putting aside the fact that some AI-written papers have been accepted to journals), it's that we do not have the human time resources or confidence in existing AI to review papers that can be produced with this technology at a much higher rate than human-written papers can be.

The aim of science is not to manufacture knowledge, it is to achieve understanding, and that by humans or (in the AI fantasy future) systems capable of leveraging that knowledge for our benefit; to me, the latter is certainly a lower priority. Having large volumes of imitation academic papers which we don't have confidence in the content of because we don't have experts with enough time to look at them will serve only to undermine the scientific process as a whole. Our jobs are on the line, not because we are going to be replaced but because our good-faith research output risks being drowned out by this firehose.

JR (Nov 25 2024 at 17:35):

Very general communication complexity considerations (specifically, relating to the computation of the attention softmax) show that a single transformer attention layer cannot reliably compute large function composition queries. An example of a composition query is the concatenation of A, B, and Q, where A is of the form “Alice’s mother is Andrea; Bob’s mother is Betsy; …”, B is of the form “Andrea is a teacher; Betsy is a surgeon; …”, and Q is of the form “What is Alice’s mother’s job?” The linked paper shows that if A and B are large—in a way irrelevant to context windows—then a transformer cannot answer Q reliably.

The same paper shows that the computation of a transformer with L layers on a prompt of length n can be performed using O(L log n) bits of memory under various polynomial scale assumptions. As a corollary, multi-layer transformers cannot solve the notoriously easy problems 2-SAT or Horn-SAT at scale unless L = NL.

Jean-Baptiste Vienney (Nov 25 2024 at 17:38):

I think it will not be too difficult to identify whether a paper is submitted by a researcher or an AI. It suffices to use Google to find whether the name of the author corresponds to a real person or not. But to deal with the cases where the author is not much on the internet, and to save time, it could be helpful to obligate authors to use ORCID identifiers.

Peva Blanchard (Nov 25 2024 at 17:41):

It's going to be difficult. On the top of my head, I can only see band-aid police-like actions:

Maybe keep an eye on water marking technologies (although there are some "conflicts of interest" ...)
I am, unfortunately, not very optimistic for the mid-term: I bet there will be an arms race between generators vs detectors.

Morgan Rogers (he/him) (Nov 25 2024 at 17:43):

The question is not of detecting them, but of putting a policy in place of what to do with them, and under what conditions; that's what I want to discuss.

Jean-Baptiste Vienney (Nov 25 2024 at 17:44):

But what really is understanding? It is a vague notion. Knowing what is true or false or what are the answers to precise questions is sufficient.

Morgan Rogers (he/him) (Nov 25 2024 at 17:51):

Sufficient for what? I'm not interested in just "knowing what is true or false or what are the answers to precise questions". If all you learned at school was a series of statements and whether they were true or false, it must have been quite an unpleasant experience.

There are further significant ethical questions around plagiarism, considering that a number of medium-sized academic publishers are selling books and research papers to AI companies for their training data.

Jean-Baptiste Vienney (Nov 25 2024 at 18:07):

I mean sufficient for practical purposes. If you can have answers to "What is a good plan to optimize the pleasantness of my day today while making my life as long as possible?", "Should I study this math question or this one? take a nap this afternoon? go for a walk at 10am? Please not be too precise and let me some choices to make in order that can I use my free will which makes me feel better" etc... what else do you want?

I agree that being able to ask questions and making analogies is also something very cool to learn.

Morgan Rogers (he/him) (Nov 25 2024 at 18:30):

It sounds like you already go to AI to get answers to those questions. They don't seem to be the kind of questions that AI could possibly answer scientifically or say definitively are "true or false" so they don't seem particularly relevant to the problems I want to discuss here.

Jean-Baptiste Vienney (Nov 25 2024 at 18:42):

David Egolf (Nov 25 2024 at 19:50):

It's an interesting question. I believe that AI-generated scientific papers are currently (mostly?) junk quality, but I also think it is somewhat likely that AI will be able to generate decent quality papers within the next decade. For me, the question of "How to manage a flood of junk papers?" is a relevant question. But I suspect that it may be replaced by a broader question perhaps within a decade: "How can humans remain the judges and managers of scientific knowledge?".

To speak to the specific topic raised, I think it might be helpful to draw an analogy to online gaming communities where computer programs imitating real players is already a huge problem. Player characters controlled by programs are usually called "bots", and it's an unending struggle to (1) detect and ban bots and (2) mitigate the impact of the bots that inevitably escape detection. Here are two measures that have - to my understanding - helped in a gaming context:

Joe Moeller (Nov 25 2024 at 21:01):

I hadn't even thought about bot authors being the problem. I assumed it would be real people using AI to pump out papers in their name, and then submitting it themselves. What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.

Peva Blanchard (Nov 25 2024 at 21:19):

I can imagine someone wanting to hack the "citation game" by making a lot of fake persons publish fake papers that cite your work.

Another is simply a dos attack: prevent a journal to publish to deteriorate its reputation. (I don't know the plausibility of this scenario, but who knows).

But your question raises, more or less, the question of defining the threat model.

David Michael Roberts (Nov 25 2024 at 21:22):

I assume the premise at the start of this thread is not autonomous bots generating pdfs that look like papers, but humans using a genAI-powered tool to rapidly generate such artefacts and also handle the submission process. So that the speed of nonsense generation can go up an order of magnitude. In some countries there are very strong incentives, even financial, to maximise certain metrics

and there are people who feel that is more important than other scholarly considerations.

David Michael Roberts (Nov 25 2024 at 21:24):

If the AI-generated text is nearly indistinguishable from solid work, it still needs to be refereed. Even if it is correct, it still needs to be refereed. Unless the AI tools can also actually referee papers too, which I suspect is even harder than generating them, then the problem I believe Morgan is highlighting is one of "humans can do the work, so that machines have time to think", to quote B(if)tek

David Michael Roberts (Nov 25 2024 at 21:29):

Trust in the literature is important to maintain, at least at or above current levels. This was something psychology had to grapple with a bit over a decade ago, with the replication crisis, and that was at regular human pace of generating flaky papers.

Joe Moeller (Nov 25 2024 at 21:33):

Oh yeah, you could even do this without accosting a journal. You just have accost the arxiv, and google will update your h-index.

David Egolf (Nov 25 2024 at 21:43):

That's a good point! I was primarily intending to draw an analogy between botting and submission of AI-generated papers. And I think it can be interesting to contemplate such an analogy. That being said, I did drift a little bit in my post above - I started considering, at least to some extent, the problem of AI-generated papers being submitted under fake credentials. I don't think that's a big problem currently; perhaps it could become a larger problem in the future if AI systems start being given additional agency.

Peva Blanchard (Nov 25 2024 at 21:48):

I got curious and browsed Arxiv's blog with the keyword "chat gpt". I found one result announcing their updated content moderation policy.

Policy for authors’ use of generative AI language tools

arXiv recognizes that authors of scientific works use a variety of tools to do the science on which they report, and to prepare the report itself, from simple ones to very sophisticated ones. Community opinion on the appropriateness of such tools may be varied and evolving; AI powered language tools have in particular led to significant debate. We note that tools may generate useful and helpful results, but also errors or misleading results; therefore, knowing which tools were used is relevant to evaluating and interpreting scientific works.

In view of this, we:

continue to require authors to report in their work any significant use of sophisticated tools, such as instruments and software; we now include in particular text-to-text generative AI among those that should be reported consistent with subject standards for methodology.
remind all colleagues that by signing their name as an author of a paper, they each individually take full responsibility for all its contents, irrespective of how the contents were generated. If generative AI language tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).
generative AI language tools should not be listed as an author; instead authors should refer to (1).

David Michael Roberts (Nov 25 2024 at 23:27):

I think people would rather garner the credit/brownie points/citations/h-indices etc for themselves, rather than submit papers under a fake identity and get no benefit from whatever metrics purport to measure scholarly output.

John Baez (Nov 26 2024 at 01:40):

Yes, unless there are purely malicious actors who want to bring science to its knees by flooding journals with second-rate papers. Which is actually an interesting thought, at least good enough for a short story. But it's easier to see people cranking out piles of papers and submitting them to journals under their own name.

Peva Blanchard (Nov 26 2024 at 07:42):

With @Joe Moeller , we discussed above the case of a malicious actor (real human) making a lot of fake papers using fake names to cite its work. It is even possible to do that with the arxiv and have google update your h-index.

By the way, when imagining scenarios of flooding journals with second-rate papers, I don't think that they are very unlikely for some organizations. For instance, the "science" produced by, e.g., tobacco companies has been used to mute findings related to, e.g., the impact of tabagism on health. (other examples would include, I guess, oil companies and climate change, or more recently social networks and data rights). Gen-ai tools have lowered the bar to such coordinated manipulations.

Daniel Geisler (Nov 26 2024 at 08:14):

As a moderator I dealt with a series of odd postings from a new user who offered to answer any open question in mathematics. Instead of quickly figuring out what was going on, I ended up going down a deep rabbit hole. I began to get so many more strange posts from new users that it bogged down the entire moderator process. I suspect the user cases discussed here don't cover they variety of motivations in play.

Morgan Rogers (he/him) (Nov 26 2024 at 10:00):

It seems we have understood some of the reasons why this is very likely to become a problem, and for those of us in academia I again want to stress it is a problem that will directly affect us. So what do we do about it?
The policy @Peva Blanchard quoted seems sensible as a first step: oblige authors to disclose their use of GenAI. That is obviously hard to enforce if sufficiently obfuscated. For similar reasons, an individual policy of "I will not review AI-authored articles" is hard to apply consistently, although I am considering putting a statement to that effect on my personal webpage anyway.

An approach which might work is a "one strike policy" against mistakes that AI is known to make but which a human could only make with significant effort or deception. These include:

Morgan Rogers (he/him) (Nov 26 2024 at 10:05):

Again, that's just at the base level of checking whether the papers are formally correct; none of that does much to staunch the potential volume or guarantee the scientific value of the papers produced (the latter being potentially very subjective).

Matteo Capucci (he/him) (Nov 26 2024 at 14:06):

For the foresseable future, the risk is that real people will use AI to submit junk to journals/conferences, not AI agents. Also it is fairly easy to fake one's existence online... see bots on Twitter

Matteo Capucci (he/him) (Nov 26 2024 at 14:06):

This is not a bad idea, at least because one could softban authors that submit slop

Matteo Capucci (he/him) (Nov 26 2024 at 14:36):

I think this is a fine rule but also probably practically useless. Either AI papers are SoTA crackpotty junk that a reviewer can spot by quickly skimming the paper, or they are significantly better at pretending to be substantial. In the first case, if someone is willing to submit such slop I doubt they would self-denounce AI usage, so you would end up reviewing these papers anyway. In the second case, the incentive to lie and pass the work as yours is much higher, and you would end up revieweing these papers anyway.

In any case, asking people to 'not do the bad thing' is usually quite useless to stop them from doing the bad thing. Instead, incentive and reputation systems can be used to shape behaviour, and academia has long relied (with varying results) on such systems. Citation metrics and networking are ways to measure and boost your reputation in a field, which is ultimately what makes your science relevant (if someone proves the abc conjecture in a forest and no one is there to hear, did they actually prove it?), and people are disincentivized from dabbling in scientific fraud because they know they are going to completely burn their reputation if caught.

So I think the solution to the rising tide of AI slop is to build reputation networks we can trust: have public lists of authors which are known to be fraudolent, make submissions conditional to a reference (like arXiv does), etc. These are not perfect systems and I'm just pointing in a general direction: we should be very careful to avoid gatekeeping real people from disadvantaged/nontraditional backgrounds. But you get the idea.

Matteo Capucci (he/him) (Nov 26 2024 at 14:37):

These seems to be excellent tasks for AI lol but techbros seems to be more preoccupied with generating bullshit than actually making helpful stuff (or if they do, they are way quieter about it than the bs generators)

Morgan Rogers (he/him) (Nov 26 2024 at 14:49):

A cynical observer might identify a pattern of creating a problem in order to build a market for the solution... :melting_face:

John Baez (Nov 26 2024 at 15:08):

Humans are guilty of the last two quite often in my experience. Maybe a good policy would be fairly merciless against behavior that seems 'dishonest', regardless of whether a human is doing it or an AI, while being merciful to things that feel like 'honest errors'.

Morgan Rogers (he/him) (Nov 26 2024 at 15:30):

beyond the first :big_smile: it's fairly easy to check if citations exist, compared with checking whether they contain a given piece of information!

Mike Shulman (Nov 26 2024 at 18:52):

I don't think that's entirely true. In many cases it's important to state clearly what is considered to be bad, since some people may not want to do bad things, but may be unclear on whether something is bad or not. (Cf. academic integrity policies on syllabi.) Maybe that isn't relevant here, I don't know. But having a clear public statement of what bad behavior is is also helpful to justify punitive actions taken against those who engage in it.

Matteo Capucci (he/him) (Nov 27 2024 at 16:00):

Morgan Rogers (he/him) (Nov 28 2024 at 08:14):

So where should we draw the line on what constitutes bad behaviour? Is it a matter of forbidding certain types of tools or limiting the volume/proportion of AI-generated material? Or are there other criteria to consider?

Chad Nester (Dec 04 2024 at 09:36):

I'd like to build on Morgan's earlier assertion that the aim of science is "not to manufacture knowledge, but to achieve understanding".

We might sharpen this a little to the assertion that the aim of science is "not to publish papers, but to achieve understanding".

This is relevant because the specific concerning thing that LLMs are capable of doing is generating research papers which are, at a glance, plausible. This is concerning largely because of the strain it might place on the formal peer-review system for publishing papers.

It seems to me that this is only a problem because of the many perverse incentives to publish as many papers as possible. One way to fix this problem is to change the way we evaluate scientific output. Currently, a significant part of the reason to publish papers is to accrue "science points", which are then used to make decisions about hiring etc.

I think making changes here would be a good idea anyway. I think our lives would be better, and the quality of our research higher, without the pressure to publish as many papers as possible.

So, it may be a good thing if AI completely destroys the publishing system. It might push us into the kind of collective action needed to reform the system into something less absurd.

Posina Venkata Rayudu (Dec 04 2024 at 13:11):

@Morgan Rogers (he/him) How would you tell if I understood? One measure of understanding is the ability to present a concept / theory in terminology (language / universe of discourse) other than the one it was initially presented (in the training phase, so to speak). Given the drift of our collective consciousness towards particulars and away from generals, I feel the need to give an example. For example, uncompetitive antagonism of ion-channels such as NMDA receptors by ions such Mg2+ is explained in terms of rates of binding and unbinding of flow of various ions such as Na+, Ca2+, etc. Now, here's how I explained to a rickshaw puller in my hometown, when asked what I do: I study how the traffic on a highway of speeding cars comes to a standstill when a herd of leisurely strolling cows join the party, so to speak ;)

Posina Venkata Rayudu (Dec 04 2024 at 14:30):

In the context of Science vis-a-vis AI, there only one question: how much of the science can AI abstract statistically (neural networks vs. AI/symbolic/Minsky et al)? To make it even more immediate, how much of the workings of a mobile do you need to know to use one? Or, bringing closer to our category theory, how far can math go surfing the superficial symbolic languages (a' la presentations) oblivious to underlying concepts: NOT VERY FAR (cf. cylindric algebra, pp. 194-195, 236; at least going by Grothendieck's descent, Professors: Bastiani & Ehresmann's sketches, and Professor F. William Lawvere's models of theories), while highlighting relative nature (given the questionable exercise of choosing global unit). Summing it all, I'd conclude: QUITE FAR :)

P.S. SCIENCE IS EVER-PROPER ALIGNMENT OF REASON WITH EXPERIENCE.
What's your understanding of science, just to make sure we are on the same page (same level of abstraction ;)

P.P.S. What's true of mathematics need not be so of language in general (as you all know):
For example, the condition for syllogistic reasoning and composition of functions is the same as in domain(2nd function) = codomain(1st function), which gives us the composite function: domain(1st function) --> codomain(2nd function). So is the case with syllogistic reasoning: Apples are Fruits AND Fruits are Edible = Apples are Edible. But we run into trouble when try to abstract statistically. For instance: Apples are Fruits AND Mangoes are Yellow = Apples are Yellow is one possible "wrong" abstraction, which we can correct by structuring reinforcement learning to make the system recognize the condition: subject(2nd proposition) = object(1st proposition). Even after this corrective measure, there is still a possibility of error: Apple is red AND Red is stop-signal = Apple is stop-signal. Note that we don't have these problems with composition of functions, which raises the possibility of statistical abstraction of the architecture of mathematics (for all mathematical objects and operations are universal mapping properties, i.e., the best / energy minimum configurations).

Morgan Rogers (he/him) (Dec 04 2024 at 17:22):

Morgan Rogers (he/him) (Dec 04 2024 at 17:36):

I don't need a special test, you can just tell me (and similarly for the collective). If papers aren't advancing the understanding if individual people, let alone the community, then they have no value.

Jean-Baptiste Vienney (Dec 05 2024 at 17:37):

What do you mean by a paper which doesn't advance the understanding of individual people or the community? If a paper is peer-reviewed and published in a journal (whether it is written by a human or an AI), how could it be such a paper?

Ryan Wisnesky (Dec 05 2024 at 19:57):

at least in CS, lots of papers get published in peer reviewed venues that state 'easy' results that might not be accepted in a more prestigious venue, because e.g. tenure committees can't tell the difference

Jean-Baptiste Vienney (Dec 05 2024 at 20:10):

So if I understand an example of a paper which does not advance the understanding of the community could be a paper which states only 'easy' results. But if it's too easy, then the reviewers could say in their reviews that the paper should not be accepted?

To me, the serious reason why a researcher should be afraid of AI is that AI could maybe deprecate his/her/their perceived (or real) value in society, not that AI is going to endanger science.

Jean-Baptiste Vienney (Dec 05 2024 at 20:13):

Mike Shulman (Dec 05 2024 at 20:28):

In 2019, in response to a MathOverflow question "Why doesn't mathematics collapse even though humans quite often make mistakes in their proofs?", I wrote

At the time, the idea of "computers that mindlessly searched for theorems and proof but sometimes made mistakes in their proofs" seemed fanciful. But today, that's exactly what LLMs could do if let loose on mathematics, and I do think this could in principle endanger the scientific enterprise. Aside from the question of dealing with volume, it's not reasonable to expect human referees to catch "random" errors of the sort that LLMs will make. Human referees can't even reliably catch non-random errors, but that doesn't make mathematics collapse because, as I said, the humans writing the proofs usually have a correct idea in mind even if their proof was wrong.

Todd Trimble (Dec 05 2024 at 21:01):

I'm speaking here from my experience as a moderator at MathOverflow. (As some of you know, I retired myself in September from that position, but @David Michael Roberts is one of the current moderators and will probably be able to speak to the current situation.) It wasn't even so much that LLMs would make mistakes in proofs; it was more that they spewed out garbage and nonsense when it comes to the high-level stuff. Because, after all, they don't actually understand what they're talking about. And it became a pestilence. Maybe it was the fault of MO users who queried ChatGPT about MO questions and then posted the output under their accounts, for not knowing how to use ChatGPT more skillfully; I couldn't say really. What it can say is that it drove moderators like me, and users who want to keep MO reliable and high-level, nuts (and it had something to do with my stepping down, frankly).

So what Mike is speculating is actually dangerously close to the truth of the matter.

Peva Blanchard (Dec 05 2024 at 21:26):

Todd Trimble (Dec 05 2024 at 22:00):

Not at all -- there are a lot of MO active users who help keep the place clean by flagging for attention actions that are problematic.

David Michael Roberts (Dec 05 2024 at 22:27):

High reputation users have soft moderation tools (vote to close, flag-as-spam, for instance) and since their expertise is broader than the small group of elected mods, this is a big help, as they can spot nonsense in what to a non-specialist is entirely plausible symbol salad.

Since we enforced registration to participate the amount of AI nonsense has definitely gone down. (We were also getting AI questions, not just answers...)

Morgan Rogers (he/him) (Dec 06 2024 at 10:30):

This ignores the point I made about the increasing volume of papers that reviewers have to read. They could say it's not good enough, but they have less time per papier on average to make such a judgement, and as Mike explained, less experience catching the type of error that an AI trained to write plausible-sounding papers will make. In the worst case scenario, it's possible for junk to pass this filter (again, this has already happened and been reported on in several fields).

Even if it's formally correct and a real person is taking credit for it, what is the value? A well-meaning person might communicate this work to the community in other ways, in which case there is some potential value, but someone who is producing AI to generate papers to artifically boost their reputation (the direct equivalent of what Todd was reporting on MO) would be disincentivised from drawing more than superficial attention to such work in case their false authorship were to be revealed, undermining that reputation.

Todd Trimble (Dec 06 2024 at 12:34):

And that too has been happening, of course. I'm thinking of a particular case that David R and I know from MO, now currently under a very long ban.

Of course, all this is peanuts compared to the criminal use of AI in insurance (I suppose all of you have heard about the United Health Care case and the targeted assassination of its CEO). I can't think of words strong enough to describe how shameful and wicked all this is.

Morgan Rogers (he/him) (Dec 06 2024 at 16:35):

I haven't, and the part of the article you shared that is available for free doesn't mention AI; what's the story? (Briefly, so as to not get too off topic :wink:)

James Deikun (Dec 06 2024 at 16:38):

Peva Blanchard (Dec 14 2024 at 14:34):

Morgan Rogers (he/him) (Dec 15 2024 at 12:33):

Eric M Downes (May 25 2025 at 12:57):

Its worth observing that many of these problems with AI are magnifications of existing challenges in academia, and larger society. The demand for referees already outstrips the system's ability to compensate referees, and so we already have seen a rise in fake papers, low quality journals, etc. over the past 2 decades. (Bogdanov affair, endless bs Elsevier journals, etc) The very fact that Elsevier is still profiting at the expense of academics, while referees go uncompensated and increasingly oversubscribed, shows just how bad things are even before AI: everyone knows its a bad Nash equilibrium, but no really effective means of disintermediation has been worked out.

So, sadly, I believe science will continue to lose some ground here, and the next decade will promote a really hard look at what is truly important, and what was merely aspirational, in terms of values in academia and science more broadly.

IIRC the question of how to maintain the public's trust and respect for mathematics, when Maths' distinguishing value is rigor, leading to high reliability -- while increasingly it is felt to be growing harder to quickly detect errors by mathematicians in proofs -- was one of Voevodsky's more pragmatic motivations for an interest in computer-aided proof, and tangentially, in univalence / HoTT. Those who knew him may correct this understanding of mine, but that is what I have been told.

I would submit this is not wrong? If Coq / Lean / Agda were better developed many journals might start to encourage formalization of at least part of the mathematics of a paper's content, especially for publication by unpublished authors, or papers reviewers admit they did not really understand. This would draw attention to the parts of the proof that are not formalized, that most need careful review. It also forces would-be spammers to contribute something adjacent for which it is easier to verify correctness. Currently we cannot even approach this in category theory etc. because not enough has been done to extend the reach of mathlib etc. and perhaps that gap can never be overcome or is more fundamental than I am making it out to be... I think that is an empirical question. But this is also only a single means of interdiction against AI-slop.

So. I would encourage the people who are the most opposed to AI to please work on discovering not just how these tools can be used to attack the ideals you seek to defend, but also how they can help you to defend them. You will have to move out of your comfort zone to do so (that is, you may have to fight the urge to rage quit after interacting with AI) but IMO the sooner you do so, the sooner you can fight back effectively. Perhaps start where it is comfortable -- teach Claude 4 / Gemini 2.5 about your area of expertise, as you would a student, and observe the similarities and differences, then experiment with which prompts can be used to increase its level of rigor. But I believe an absolutist moral stance ("AI is yucky, we should view it primarily as a threat") will ultimately backfire, here.

John Baez (May 25 2025 at 13:06):

I agree with your first paragraph, but it's a bit horrifying if mathematicians wind up being required to formalized proofs because AI slop overwhelms human referees. I would prefer other solution to the problem of academia and bureaucratic beancounters pressuring academics to publish large numbers of refereed papers. But if formalized proofs become required by most journals - if this is the unstoppable direction we are going on - I will stop publishing math papers, do other things, and celebrate the fact that I lived in the wild, exciting, primitive era of unformalized proof.

Eric M Downes (May 25 2025 at 13:07):

Well, my hope is that you would have people and AIs who would do the formalization for you. If you have such a group, this speaks to your good intentions, or at least that they trust you enough to stake their own reputations.

Obv. people should do what they are good at: many academics already do not make all their own slides, nor write all their own code. I agree getting the game theoretic incentives here to work out is non-trivial... and I expect perhaps only non-established authors would have to jump this hoop... the alternative seems to be to just bar such people from publishing entirely? Openness seems an important value that would be lost if the reaction is to bar non-experts outright.... Understand I'm not proposing an ideal situation, I'm looking at the way things are headed, and suggesting a potential alternative that could be made less-bad.

John Baez (May 25 2025 at 13:20):

Okay. I will just talk about myself because the problems you're raising are so vast that it's easy to get lost in generality, when in fact they concern thousands or even millions of specific people and their decisions, and I really only know about myself and my friends.

Let me continue imagining the future where most journals start to require formalization of proofs. Maybe my work is interesting enough that people would volunteer to formalize some of it. Or maybe not.

I'm not going to apply for grants to pay people to do this work, since I hate getting grants and recently all my meager and not very successful attempts at getting grants have solely been to get money to develop software for things I consider beneficial to humanity: epidemiological modeling, economic modeling - and ultimately, I hope, climate change.

That reminds me, though, of how people volunteered to take the concept of "structured cospans" and build a computer system that could work with it. (Not formalize it, just use it in software.) So there's a concept I helped promulgate, which was interesting enough that people who like programming helped embody it in software.

But that concept was much more practical than, say, my theorems on topological crystals. I can't imagine there are enough people to formalize all the theorems like that which people are cranking out. So you'd need to have AIs formalizing that sort of work. And then the question becomes: how much patience and skill will mathematicians need to have, to interact with AIs in order to formalize their work? Will the AIs ever go through the massive backlog of existing papers and try to formalize those, or will they remain as a kind of stratum of unrecovered material?

I'm not posing this questions as some sort of rhetorical argument: I'm just thinking about them.

John Baez (May 25 2025 at 13:32):

Different people are "opposed to AI" in different ways - they may like some aspects of it and not others. Only the most foolish would summarize their stance as "AI is yucky". I think it's better to talk about more reasonable positions.

For example I am opposed to how the heads of corporations are hoping to use AI to replace people for almost every job except the job of being a head of a corporation. For example, how Spotify is cranking out AI tunes, pretending they are by human musicians, and promoting these tunes to increase their profits because they don't have to pay musicians.

John Baez (May 25 2025 at 13:33):

One approach to this problem might be to develop "CEO in a box", an AI that could replace a CEO. :smiling_imp:

John Baez (May 25 2025 at 13:37):

So I would be in favor of AI if we could successfully use it to dramatically drive down the income of CEOs.

Eric M Downes (May 25 2025 at 13:45):

Excellent question! A related question might be -- what is a corpus of existing work, small and self contained but believed correct, which might serve as a good test for formalization?

More generally, I expect the answer to "what will AIs be used for?" to be "what we choose to use them for", that is: limited only by some combination of creativity, persistence, stubbornness, and availability. So wether AI can be helpful in formalizing will in part be driven by enough someones trying to make it happen. So far, it seems that when a heuristic is objective and tightly posed for the problem at hand, a combination of evolutionary algorithms and AI have been quite successful, as per nature.

This need not be a theoretical question -- DeepMind is accepting applications for access to the above system, here... I cannot think of a formalization metric that is as tight as "multiplies matrices using the fewest scalar multiplications"... but maybe someone smarter can.

John Baez (May 25 2025 at 13:48):

I instead think the answer is closer to "what our corporate overlords choose to use them for, until we manage to overthrow them or at least rein them in, which is harder now that the US has become an authoritarian state led by Trump and a circle of oligarchs like Musk".

Eric M Downes (May 25 2025 at 14:01):

No argument that things are bad. But we should never give them more power in our minds than they already have... If AI is like the demon or djinn in stories, which is unleashed by a power-hungry madman seeking more power, then I think we know how things end for that madman... we certainly prefer the universe branch where the djinn never came out the bottle because we might yet share the same fate as the madman, yet nonetheless, here we are stuck in a post-bottle-opened world. I think I would prefer to be like Ukraine, who embraced the use of flying death robots to partially-neutralize their enemies' greater numbers and heavier artillery. They didn't do this because the situation was good. They did it because they had to do something, and they are very resourceful.

John Baez (May 25 2025 at 14:07):

Well, I guess I'm trying to say that if I worked on AI, I'd feel it urgent to use AI to take down the corporate overlords who have weaponized AI against us. Doing something like "teaching Claude 4 / Gemini 2.5 about my area of expertise" feels like contributing my knowledge to my enemies for free. I'd rather not.

Eric M Downes (May 25 2025 at 14:14):

Ah, good point. One thing worth noting for those who dont trust the company not to steal content from your chats for future training: you can run DeepSeek or another open source model inside a docker container that monitors network traffic, such that anything you teach it (in a context-window/prompt only sense of "teach", I expect you aren't actually spending the considerable money to train it) will not accrue to the creator of the LLM.

So if concern for your IP / knowledge is stopping any of the mathematicians here from exploring positive uses of AI, let me know and perhaps we can whip up a nice isolation environment for you; others have done so already.

Ryan Wisnesky (May 25 2025 at 16:16):

This argument is not new; it's been happening since as long as computers have been proving theorems (see e.g. 1984's "reflections on trusting trust"). Usually it is framed: why is automated proving more useful to say software engineering than mathematicians. For a while, the consensus seemed to be that in software, you needed to get one automated system to trust another, which can be achieved with e.g. really long Coq proofs. But math is a social processes where everyone's understanding must converge, and gigantic formalized proofs don't really help with that social process. (Although some mathematicians like Terrance Tao etc do embrace computer proofs). But anyway, I would claim that if the goal is for people to understand, there are limits to how much AI can ever do (AI can explain it to you, but not understand it for you).

James Deikun (May 25 2025 at 16:37):

Something I find problematic is the attitude in much of the AI space that understanding is irrelevant: that as long as computers can do something, it matters not if neither humans nor computers themselves understand how they're doing it. Engineering has been basically publicly working on its divorce papers with science for a couple of decades and scientists (along with mathematicians, who also work on "understanding") are still being caught flatfooted by the decrease in social and political relevance this implies. I don't really know what to do about this either; it's kind of invisible to most people other than engineers that we're going from "tools only their designers understand" to "tools literally nobody understands" but I feel certain this is going to lead to some kind of disaster eventually, whether an "AI apocalypse" or something more mundane.

Eric M Downes (May 25 2025 at 21:42):

Your concerns are certainly very real, or at least I too perceive and weight them. A Foundation-style collapse could well be how all this ends. We live in a time of wonders and horrors.

And If you wish to do so, I believe LLMs have a special potential for interrogating details you care about understanding, while not primarily stressing the details you don't particularly wish to engage. It is then up to your professionalism to track down the remainder of the details, once you understand the geist. It really depends on how the tool is used? I expect Terry Tao does not ask his proof assistants to do the understanding for him.

I expect that discovery is an emergent process which does not take place entirely in the mind of one participant, much as is already the case with human collaboration. So instead of scoping discovery to colimits of squishy neurons, now we can also include colimits of squishy and silicon neurons?

There may be a time when the proof will come first, and the understanding follows, for those who wish to understand... indeed I would say in some sense, we have always been there, at least with proof-sketches? Eugenia Cheng talks about this on a more philosophical level, but I can be more plebian: Maybe your proofs are correct the first time you write them down, but mine sure aren't! :)

John Baez (May 25 2025 at 22:02):

Mike Shulman (May 25 2025 at 22:02):

I often feel that I "understand" a proof before I figure out exactly how the details work. Even an incorrect proof can still be based on a correct understanding.

John Baez (May 25 2025 at 22:05):

Yes, mathematicians often do proofs "top-down", breaking the proof down into a bunch of lemmas... and when all the lemmas seem "obvious", some sort of understanding has been achieved, even if proving those lemmas may wind up being more technical than expected.

Mike Shulman (May 25 2025 at 22:05):

Eric M Downes (May 25 2025 at 22:52):

Understanding preceding technical details is a sign of mastery and a highly desirable goal. Would you say that this order also characterizes your engagement within areas of greater uncertainty, though?

Such as areas where you are not an expert, no personal guide is readily available, and the literature in your language at least, is not as helpful as one might wish.

Perhaps what I am saying is simply that Discovery precedes Understanding, and these AIs can be powerful tools within regions of uncertainty, such as for discovery/collaboration slightly beyond the horizon of one's expertise. If used properly, they need not limit nor preclude understanding, but this process is uncomfortable and mistake-prone. Perhaps finding and sharing good error-detection+correction praxis is the key to making this less offensive.

Mike Shulman (May 25 2025 at 23:48):

It took me a while to parse that correctly; I think you mean "it is a sign of mastery to be able to understand something at a high level before getting into the technical details". Right?

Mike Shulman (May 25 2025 at 23:50):

Absolutely. I can't even motivate myself to try to figure out the technical details until I feel like I understand the high-level idea.

James Deikun (May 25 2025 at 23:54):

I want to specifically chime in as someone who might seem like less of a "master of all he surveys" and say that both when learning a new area of established math and when doing research I tend to establish (and very often, later throw away) high-level and broad understandings of how things work as soon as possible if not sooner, rather than grappling with low-level details in a theory-free way.

James Deikun (May 25 2025 at 23:56):

John Baez (May 26 2025 at 07:50):

I think young students are more likely to be in the habit of learning lots of technical details before understanding the big picture in broad strokes, maybe because they're often forced to learn that way by their teachers.

Morgan Rogers (he/him) (May 26 2025 at 09:36):

I am glad that people are already making points I would have made in this discussion. (Thanks @Eric M Downes also for reopening it.)
Teaching computer science I have witnessed first-hand how AI is used to avoid the work required for understanding and so is actively eroding understanding in the upcoming generation of students. I do not believe it is necessary to accept AI tools, especially to accept the tools that exist today largely produced by tech companies I hold in extremely low esteem. I have colleagues working on AI in proof-checking etc. whom I trust will eventually make AI tools that benefit the science community and society directly, to the extent that this is possible, rather than improving the value of products built for profit that even our governments are trying to bash us over the head with. In the mean time, I do not intend to engage with AI any more than is necessary to understand how to design exercises it cannot yet complete accurately on students' behalf, to make teaching feasible.

Josselin Poiret (May 26 2025 at 10:03):

rather than “AI” in general I feel we're mostly talking about LLMs here, and there's one huge blind spot that's been glossed over in this recent discussion: the human and environmental impacts that this trend causes. Every single minuscule improvement in models like ChatGPT or Claude is made at the expense of very real people classifying all the training information for a handful of bucks in the global south, often being exposed to harmful material, with no way of speaking out publicly since they are under NDAs. Global computing resources are being built at a much higher speed, and the materials necessary for building the chips are being extracted faster and from more places, leading countries trying to follow the LLM boom to lessen their environmental commitments (which tech companies have already abandoned). This is just a re-enactement of the extractism tendencies of yore with a new coat of paint.

Josselin Poiret (May 26 2025 at 10:06):

as academics, I feel that it is our responsibility to act as a counter-model against the global trend that society is following, precisely because we have students in front of us who might be tempted to fall in that trap

John Baez (May 26 2025 at 11:49):

I tend to agree with @Josselin Poiret's point here, and that's another reason I avoid working with or on LLMs. I'd like to better understand the carbon footprint of LLMs, but even the new Technology Review article on precisely this topic leaved me unsatisfied.

Stream: practice: communication

Topic: The AI "Scientist"

Morgan Rogers (he/him) (Nov 25 2024 at 16:31):

Jean-Baptiste Vienney (Nov 25 2024 at 16:59):

Jean-Baptiste Vienney (Nov 25 2024 at 17:09):

Jean-Baptiste Vienney (Nov 25 2024 at 17:10):

Jean-Baptiste Vienney (Nov 25 2024 at 17:12):

Ryan Wisnesky (Nov 25 2024 at 17:18):

Morgan Rogers (he/him) (Nov 25 2024 at 17:30):

JR (Nov 25 2024 at 17:35):

Jean-Baptiste Vienney (Nov 25 2024 at 17:38):

Peva Blanchard (Nov 25 2024 at 17:41):

Morgan Rogers (he/him) (Nov 25 2024 at 17:43):

Jean-Baptiste Vienney (Nov 25 2024 at 17:44):

Morgan Rogers (he/him) (Nov 25 2024 at 17:51):

Jean-Baptiste Vienney (Nov 25 2024 at 18:07):

Morgan Rogers (he/him) (Nov 25 2024 at 18:30):

Jean-Baptiste Vienney (Nov 25 2024 at 18:42):

David Egolf (Nov 25 2024 at 19:50):

Joe Moeller (Nov 25 2024 at 21:01):

Peva Blanchard (Nov 25 2024 at 21:19):

David Michael Roberts (Nov 25 2024 at 21:22):

David Michael Roberts (Nov 25 2024 at 21:24):

David Michael Roberts (Nov 25 2024 at 21:29):

Joe Moeller (Nov 25 2024 at 21:33):

David Egolf (Nov 25 2024 at 21:43):

Peva Blanchard (Nov 25 2024 at 21:48):

David Michael Roberts (Nov 25 2024 at 23:27):

John Baez (Nov 26 2024 at 01:40):

Peva Blanchard (Nov 26 2024 at 07:42):

Daniel Geisler (Nov 26 2024 at 08:14):

Morgan Rogers (he/him) (Nov 26 2024 at 10:00):

Morgan Rogers (he/him) (Nov 26 2024 at 10:05):

Matteo Capucci (he/him) (Nov 26 2024 at 14:06):

Matteo Capucci (he/him) (Nov 26 2024 at 14:06):

Matteo Capucci (he/him) (Nov 26 2024 at 14:36):

Matteo Capucci (he/him) (Nov 26 2024 at 14:37):

Morgan Rogers (he/him) (Nov 26 2024 at 14:49):

John Baez (Nov 26 2024 at 15:08):

Morgan Rogers (he/him) (Nov 26 2024 at 15:30):

Mike Shulman (Nov 26 2024 at 18:52):

Matteo Capucci (he/him) (Nov 27 2024 at 16:00):

Morgan Rogers (he/him) (Nov 28 2024 at 08:14):

Chad Nester (Dec 04 2024 at 09:36):

Posina Venkata Rayudu (Dec 04 2024 at 13:11):

Posina Venkata Rayudu (Dec 04 2024 at 14:30):

Morgan Rogers (he/him) (Dec 04 2024 at 17:22):

Morgan Rogers (he/him) (Dec 04 2024 at 17:36):

Jean-Baptiste Vienney (Dec 05 2024 at 17:37):

Ryan Wisnesky (Dec 05 2024 at 19:57):

Jean-Baptiste Vienney (Dec 05 2024 at 20:10):

Jean-Baptiste Vienney (Dec 05 2024 at 20:13):

Mike Shulman (Dec 05 2024 at 20:28):

Todd Trimble (Dec 05 2024 at 21:01):

Peva Blanchard (Dec 05 2024 at 21:26):

Todd Trimble (Dec 05 2024 at 22:00):

David Michael Roberts (Dec 05 2024 at 22:27):

Morgan Rogers (he/him) (Dec 06 2024 at 10:30):

Todd Trimble (Dec 06 2024 at 12:34):

Morgan Rogers (he/him) (Dec 06 2024 at 16:35):

James Deikun (Dec 06 2024 at 16:38):

Peva Blanchard (Dec 14 2024 at 14:34):

Morgan Rogers (he/him) (Dec 15 2024 at 12:33):

Eric M Downes (May 25 2025 at 12:57):

John Baez (May 25 2025 at 13:06):

Eric M Downes (May 25 2025 at 13:07):

John Baez (May 25 2025 at 13:20):

John Baez (May 25 2025 at 13:32):

John Baez (May 25 2025 at 13:33):

John Baez (May 25 2025 at 13:37):

Eric M Downes (May 25 2025 at 13:45):

John Baez (May 25 2025 at 13:48):

Eric M Downes (May 25 2025 at 14:01):

John Baez (May 25 2025 at 14:07):

Eric M Downes (May 25 2025 at 14:14):

Ryan Wisnesky (May 25 2025 at 16:16):

James Deikun (May 25 2025 at 16:37):

Eric M Downes (May 25 2025 at 21:42):

John Baez (May 25 2025 at 22:02):

Mike Shulman (May 25 2025 at 22:02):