You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
My wife sent me this, a paper heralding "the AI scientist", an ensemble of AI tools allegedly capable of automating the whole research ecosystem. For those on Bluesky, this has been critiqued admirably by Carl T. Bergstrom, but it is a sign that scientific communities need to move fast to avoid being submerged by a tidal wave of AI-generated junk papers. I know from direct experience that these are already showing up in isolation, but the construction of a system for rolling them out is an alarming development.
Since several organizers of conferences and editors of journals are present here, I wanted to launch a discussion of what we can collectively do to manage this trend.
Ok, it doesn’t work well now. But it is not impossible that AI will end up replacing researchers ahah (however ChatGPT says it will not happen tomorrow).
I would have liked to have more details about what Carl T. Bergstrom was thinking about when he wrote: "Given what I know of fundamental limits to what LLMs can do, I see no reason to agree."
What are these fundamental limits please?
I think nobody knows whether there are such fundamental limits: maybe there are, maybe there aren't.
One way to understand LLM limitations and abilities is to actually examine their code. GPT2 is only 400 lines of Haskell (https://github.com/tensor-fusion/GPT-Haskell), and is also available as an excel spreadsheet (with training data) and a giant recursive SQL query. You can see for yourself, if the code is "thinking" or "averaging" (my favorite metaphor) or something else entirely. And then ask and answer math questions about the computational model, such as whether it can compute all functions, etc. I understand many LLM architectures cannot be 'Turing complete' for example, while others are.
The problem is not whether AI could eventually produce publication-worthy papers (putting aside the fact that some AI-written papers have been accepted to journals), it's that we do not have the human time resources or confidence in existing AI to review papers that can be produced with this technology at a much higher rate than human-written papers can be.
The aim of science is not to manufacture knowledge, it is to achieve understanding, and that by humans or (in the AI fantasy future) systems capable of leveraging that knowledge for our benefit; to me, the latter is certainly a lower priority. Having large volumes of imitation academic papers which we don't have confidence in the content of because we don't have experts with enough time to look at them will serve only to undermine the scientific process as a whole. Our jobs are on the line, not because we are going to be replaced but because our good-faith research output risks being drowned out by this firehose.
Ryan Wisnesky said:
I understand many LLM architectures cannot be 'Turing complete' for example, while others are.
Very general communication complexity considerations (specifically, relating to the computation of the attention softmax) show that a single transformer attention layer cannot reliably compute large function composition queries. An example of a composition query is the concatenation of A, B, and Q, where A is of the form “Alice’s mother is Andrea; Bob’s mother is Betsy; …”, B is of the form “Andrea is a teacher; Betsy is a surgeon; …”, and Q is of the form “What is Alice’s mother’s job?” The linked paper shows that if A and B are large—in a way irrelevant to context windows—then a transformer cannot answer Q reliably.
The same paper shows that the computation of a transformer with L layers on a prompt of length n can be performed using O(L log n) bits of memory under various polynomial scale assumptions. As a corollary, multi-layer transformers cannot solve the notoriously easy problems 2-SAT or Horn-SAT at scale unless L = NL.
On the other hand, the representational power of multi-head transformer architectures is bounded below by n-gram models, which can perform impressively in practice.
Morgan Rogers (he/him) said:
Having large volumes of imitation academic papers which we don't have confidence in the content of because we don't have experts with enough time to look at them will serve only to undermine the scientific process as a whole. Our jobs are on the line, not because we are going to be replaced but because our good-faith research output risks being drowned out by this firehose.
I think it will not be too difficult to identify whether a paper is submitted by a researcher or an AI. It suffices to use Google to find whether the name of the author corresponds to a real person or not. But to deal with the cases where the author is not much on the internet, and to save time, it could be helpful to obligate authors to use ORCID identifiers.
It's going to be difficult. On the top of my head, I can only see band-aid police-like actions:
Maybe keep an eye on water marking technologies (although there are some "conflicts of interest" ...)
I am, unfortunately, not very optimistic for the mid-term: I bet there will be an arms race between generators vs detectors.
Jean-Baptiste Vienney said:
I think it will not be too difficult to identify whether a paper is submitted by a researcher or an AI.
The question is not of detecting them, but of putting a policy in place of what to do with them, and under what conditions; that's what I want to discuss.
Morgan Rogers (he/him) said:
The aim of science is not to manufacture knowledge, it is to achieve understanding
But what really is understanding? It is a vague notion. Knowing what is true or false or what are the answers to precise questions is sufficient.
Sufficient for what? I'm not interested in just "knowing what is true or false or what are the answers to precise questions". If all you learned at school was a series of statements and whether they were true or false, it must have been quite an unpleasant experience.
There are further significant ethical questions around plagiarism, considering that a number of medium-sized academic publishers are selling books and research papers to AI companies for their training data.
Morgan Rogers (he/him) said:
Sufficient for what? I'm not interested in just "knowing what is true or false or what are the answers to precise questions". If all you learned at school was a series of statements and whether they were true or false, it must have been quite an unpleasant experience.
I mean sufficient for practical purposes. If you can have answers to "What is a good plan to optimize the pleasantness of my day today while making my life as long as possible?", "Should I study this math question or this one? take a nap this afternoon? go for a walk at 10am? Please not be too precise and let me some choices to make in order that can I use my free will which makes me feel better" etc... what else do you want?
I agree that being able to ask questions and making analogies is also something very cool to learn.
It sounds like you already go to AI to get answers to those questions. They don't seem to be the kind of questions that AI could possibly answer scientifically or say definitively are "true or false" so they don't seem particularly relevant to the problems I want to discuss here.
I'm done. I let you discuss whatever is relevant to your problems. :)
It's an interesting question. I believe that AI-generated scientific papers are currently (mostly?) junk quality, but I also think it is somewhat likely that AI will be able to generate decent quality papers within the next decade. For me, the question of "How to manage a flood of junk papers?" is a relevant question. But I suspect that it may be replaced by a broader question perhaps within a decade: "How can humans remain the judges and managers of scientific knowledge?".
To speak to the specific topic raised, I think it might be helpful to draw an analogy to online gaming communities where computer programs imitating real players is already a huge problem. Player characters controlled by programs are usually called "bots", and it's an unending struggle to (1) detect and ban bots and (2) mitigate the impact of the bots that inevitably escape detection. Here are two measures that have - to my understanding - helped in a gaming context:
Analogous measures in the context of paper submission could include:
All of these measures have serious problems with them!
I hadn't even thought about bot authors being the problem. I assumed it would be real people using AI to pump out papers in their name, and then submitting it themselves. What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.
Joe Moeller said:
What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.
I can imagine someone wanting to hack the "citation game" by making a lot of fake persons publish fake papers that cite your work.
Another is simply a dos attack: prevent a journal to publish to deteriorate its reputation. (I don't know the plausibility of this scenario, but who knows).
But your question raises, more or less, the question of defining the threat model.
I assume the premise at the start of this thread is not autonomous bots generating pdfs that look like papers, but humans using a genAI-powered tool to rapidly generate such artefacts and also handle the submission process. So that the speed of nonsense generation can go up an order of magnitude. In some countries there are very strong incentives, even financial, to maximise certain metrics
https://mathstodon.xyz/@highergeometer/113541639662531977
and there are people who feel that is more important than other scholarly considerations.
If the AI-generated text is nearly indistinguishable from solid work, it still needs to be refereed. Even if it is correct, it still needs to be refereed. Unless the AI tools can also actually referee papers too, which I suspect is even harder than generating them, then the problem I believe Morgan is highlighting is one of "humans can do the work, so that machines have time to think", to quote B(if)tek
Trust in the literature is important to maintain, at least at or above current levels. This was something psychology had to grapple with a bit over a decade ago, with the replication crisis, and that was at regular human pace of generating flaky papers.
Peva Blanchard said:
Joe Moeller said:
What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.
I can imagine someone wanting to hack the "citation game" by making a lot of fake persons publish fake papers that cite your work.
Oh yeah, you could even do this without accosting a journal. You just have accost the arxiv, and google will update your h-index.
Joe Moeller said:
I hadn't even thought about bot authors being the problem. I assumed it would be real people using AI to pump out papers in their name, and then submitting it themselves. What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.
That's a good point! I was primarily intending to draw an analogy between botting and submission of AI-generated papers. And I think it can be interesting to contemplate such an analogy. That being said, I did drift a little bit in my post above - I started considering, at least to some extent, the problem of AI-generated papers being submitted under fake credentials. I don't think that's a big problem currently; perhaps it could become a larger problem in the future if AI systems start being given additional agency.
I got curious and browsed Arxiv's blog with the keyword "chat gpt". I found one result announcing their updated content moderation policy.
Here is the excerpt.
Policy for authors’ use of generative AI language tools
David Egolf said:
I started considering, at least to some extent, the problem of AI-generated papers being submitted under fake credentials. I don't think that's a big problem currently; perhaps it could become a larger problem in the future if AI systems start being given additional agency.
I think people would rather garner the credit/brownie points/citations/h-indices etc for themselves, rather than submit papers under a fake identity and get no benefit from whatever metrics purport to measure scholarly output.
Yes, unless there are purely malicious actors who want to bring science to its knees by flooding journals with second-rate papers. Which is actually an interesting thought, at least good enough for a short story. But it's easier to see people cranking out piles of papers and submitting them to journals under their own name.
With @Joe Moeller , we discussed above the case of a malicious actor (real human) making a lot of fake papers using fake names to cite its work. It is even possible to do that with the arxiv and have google update your h-index.
By the way, when imagining scenarios of flooding journals with second-rate papers, I don't think that they are very unlikely for some organizations. For instance, the "science" produced by, e.g., tobacco companies has been used to mute findings related to, e.g., the impact of tabagism on health. (other examples would include, I guess, oil companies and climate change, or more recently social networks and data rights). Gen-ai tools have lowered the bar to such coordinated manipulations.
As a moderator I dealt with a series of odd postings from a new user who offered to answer any open question in mathematics. Instead of quickly figuring out what was going on, I ended up going down a deep rabbit hole. I began to get so many more strange posts from new users that it bogged down the entire moderator process. I suspect the user cases discussed here don't cover they variety of motivations in play.
It seems we have understood some of the reasons why this is very likely to become a problem, and for those of us in academia I again want to stress it is a problem that will directly affect us. So what do we do about it?
The policy @Peva Blanchard quoted seems sensible as a first step: oblige authors to disclose their use of GenAI. That is obviously hard to enforce if sufficiently obfuscated. For similar reasons, an individual policy of "I will not review AI-authored articles" is hard to apply consistently, although I am considering putting a statement to that effect on my personal webpage anyway.
An approach which might work is a "one strike policy" against mistakes that AI is known to make but which a human could only make with significant effort or deception. These include:
Checking any of these things beyond the requires significant effort.
Again, that's just at the base level of checking whether the papers are formally correct; none of that does much to staunch the potential volume or guarantee the scientific value of the papers produced (the latter being potentially very subjective).
Jean-Baptiste Vienney said:
I think it will not be too difficult to identify whether a paper is submitted by a researcher or an AI. It suffices to use Google to find whether the name of the author corresponds to a real person or not.
For the foresseable future, the risk is that real people will use AI to submit junk to journals/conferences, not AI agents. Also it is fairly easy to fake one's existence online... see bots on Twitter
Jean-Baptiste Vienney said:
it could be helpful to obligate authors to use ORCID identifiers.
This is not a bad idea, at least because one could softban authors that submit slop
Morgan Rogers (he/him) said:
oblige authors to disclose their use of GenAI
I think this is a fine rule but also probably practically useless. Either AI papers are SoTA crackpotty junk that a reviewer can spot by quickly skimming the paper, or they are significantly better at pretending to be substantial. In the first case, if someone is willing to submit such slop I doubt they would self-denounce AI usage, so you would end up reviewing these papers anyway. In the second case, the incentive to lie and pass the work as yours is much higher, and you would end up revieweing these papers anyway.
In any case, asking people to 'not do the bad thing' is usually quite useless to stop them from doing the bad thing. Instead, incentive and reputation systems can be used to shape behaviour, and academia has long relied (with varying results) on such systems. Citation metrics and networking are ways to measure and boost your reputation in a field, which is ultimately what makes your science relevant (if someone proves the abc conjecture in a forest and no one is there to hear, did they actually prove it?), and people are disincentivized from dabbling in scientific fraud because they know they are going to completely burn their reputation if caught.
So I think the solution to the rising tide of AI slop is to build reputation networks we can trust: have public lists of authors which are known to be fraudolent, make submissions conditional to a reference (like arXiv does), etc. These are not perfect systems and I'm just pointing in a general direction: we should be very careful to avoid gatekeeping real people from disadvantaged/nontraditional backgrounds. But you get the idea.
Morgan Rogers (he/him) said:
An approach which might work is a "one strike policy" against mistakes that AI is known to make but which a human could only make with significant effort or deception. These include:
- Inventing citations
- Citing articles which do not contain results that they are claimed to contain (although I have reviewed human-authored papers which failed this one)
- Logically incoherent proofs
- Non-stated or non-sourced definitions of terms (one needs to allow some wiggle room in this one...)
Checking any of these things beyond the requires significant effort.
These seems to be excellent tasks for AI lol but techbros seems to be more preoccupied with generating bullshit than actually making helpful stuff (or if they do, they are way quieter about it than the bs generators)
A cynical observer might identify a pattern of creating a problem in order to build a market for the solution... :melting_face:
Morgan Rogers (he/him) said:
An approach which might work is a "one strike policy" against mistakes that AI is known to make but which a human could only make with significant effort or deception. These include:
- Inventing citations
- Citing articles which do not contain results that they are claimed to contain (although I have reviewed human-authored papers which failed this one)
- Logically incoherent proofs
- Non-stated or non-sourced definitions of terms (one needs to allow some wiggle room in this one...)
Humans are guilty of the last two quite often in my experience. Maybe a good policy would be fairly merciless against behavior that seems 'dishonest', regardless of whether a human is doing it or an AI, while being merciful to things that feel like 'honest errors'.
Checking any of these things beyond the requires significant effort.
Beyond the what?
beyond the first :big_smile: it's fairly easy to check if citations exist, compared with checking whether they contain a given piece of information!
Matteo Capucci (he/him) said:
asking people to 'not do the bad thing' is usually quite useless to stop them from doing the bad thing.
I don't think that's entirely true. In many cases it's important to state clearly what is considered to be bad, since some people may not want to do bad things, but may be unclear on whether something is bad or not. (Cf. academic integrity policies on syllabi.) Maybe that isn't relevant here, I don't know. But having a clear public statement of what bad behavior is is also helpful to justify punitive actions taken against those who engage in it.
Of course!
Mike Shulman said:
But having a clear public statement of what bad behavior is is also helpful to justify punitive actions taken against those who engage in it.
So where should we draw the line on what constitutes bad behaviour? Is it a matter of forbidding certain types of tools or limiting the volume/proportion of AI-generated material? Or are there other criteria to consider?
I'd like to build on Morgan's earlier assertion that the aim of science is "not to manufacture knowledge, but to achieve understanding".
We might sharpen this a little to the assertion that the aim of science is "not to publish papers, but to achieve understanding".
This is relevant because the specific concerning thing that LLMs are capable of doing is generating research papers which are, at a glance, plausible. This is concerning largely because of the strain it might place on the formal peer-review system for publishing papers.
It seems to me that this is only a problem because of the many perverse incentives to publish as many papers as possible. One way to fix this problem is to change the way we evaluate scientific output. Currently, a significant part of the reason to publish papers is to accrue "science points", which are then used to make decisions about hiring etc.
I think making changes here would be a good idea anyway. I think our lives would be better, and the quality of our research higher, without the pressure to publish as many papers as possible.
So, it may be a good thing if AI completely destroys the publishing system. It might push us into the kind of collective action needed to reform the system into something less absurd.
@Morgan Rogers (he/him) How would you tell if I understood? One measure of understanding is the ability to present a concept / theory in terminology (language / universe of discourse) other than the one it was initially presented (in the training phase, so to speak). Given the drift of our collective consciousness towards particulars and away from generals, I feel the need to give an example. For example, uncompetitive antagonism of ion-channels such as NMDA receptors by ions such Mg2+ is explained in terms of rates of binding and unbinding of flow of various ions such as Na+, Ca2+, etc. Now, here's how I explained to a rickshaw puller in my hometown, when asked what I do: I study how the traffic on a highway of speeding cars comes to a standstill when a herd of leisurely strolling cows join the party, so to speak ;)
Morgan Rogers (he/him) said:
In the context of Science vis-a-vis AI, there only one question: how much of the science can AI abstract statistically (neural networks vs. AI/symbolic/Minsky et al)? To make it even more immediate, how much of the workings of a mobile do you need to know to use one? Or, bringing closer to our category theory, how far can math go surfing the superficial symbolic languages (a' la presentations) oblivious to underlying concepts: NOT VERY FAR (cf. cylindric algebra, pp. 194-195, 236; at least going by Grothendieck's descent, Professors: Bastiani & Ehresmann's sketches, and Professor F. William Lawvere's models of theories), while highlighting relative nature (given the questionable exercise of choosing global unit). Summing it all, I'd conclude: QUITE FAR :)
P.S. SCIENCE IS EVER-PROPER ALIGNMENT OF REASON WITH EXPERIENCE.
What's your understanding of science, just to make sure we are on the same page (same level of abstraction ;)
P.P.S. What's true of mathematics need not be so of language in general (as you all know):
For example, the condition for syllogistic reasoning and composition of functions is the same as in domain(2nd function) = codomain(1st function), which gives us the composite function: domain(1st function) --> codomain(2nd function). So is the case with syllogistic reasoning: Apples are Fruits AND Fruits are Edible = Apples are Edible. But we run into trouble when try to abstract statistically. For instance: Apples are Fruits AND Mangoes are Yellow = Apples are Yellow is one possible "wrong" abstraction, which we can correct by structuring reinforcement learning to make the system recognize the condition: subject(2nd proposition) = object(1st proposition). Even after this corrective measure, there is still a possibility of error: Apple is red AND Red is stop-signal = Apple is stop-signal. Note that we don't have these problems with composition of functions, which raises the possibility of statistical abstraction of the architecture of mathematics (for all mathematical objects and operations are universal mapping properties, i.e., the best / energy minimum configurations).
Chad Nester said:
So, it may be a good thing if AI completely destroys the publishing system. It might push us into the kind of collective action needed to reform the system into something less absurd.
If academia survives the destruction, that is :grimacing:
Posina Venkata Rayudu said:
Morgan Rogers (he/him) How would you tell if I understood?
I don't need a special test, you can just tell me (and similarly for the collective). If papers aren't advancing the understanding if individual people, let alone the community, then they have no value.
What do you mean by a paper which doesn't advance the understanding of individual people or the community? If a paper is peer-reviewed and published in a journal (whether it is written by a human or an AI), how could it be such a paper?
at least in CS, lots of papers get published in peer reviewed venues that state 'easy' results that might not be accepted in a more prestigious venue, because e.g. tenure committees can't tell the difference
So if I understand an example of a paper which does not advance the understanding of the community could be a paper which states only 'easy' results. But if it's too easy, then the reviewers could say in their reviews that the paper should not be accepted?
To me, the serious reason why a researcher should be afraid of AI is that AI could maybe deprecate his/her/their perceived (or real) value in society, not that AI is going to endanger science.
(But I have no idea if it well happen or not honnestly)
In 2019, in response to a MathOverflow question "Why doesn't mathematics collapse even though humans quite often make mistakes in their proofs?", I wrote
I think another reason that mathematics doesn't collapse is that the fundamental content of mathematics is ideas and understanding, not only proofs. If mathematics were done by computers that mindlessly searched for theorems and proof but sometimes made mistakes in their proofs, then I expect that it would collapse. But usually when a human mathematician proves a theorem, they do it by achieving some new understanding or idea, and usually that idea is "correct" even if the first proof given involving it is not.
At the time, the idea of "computers that mindlessly searched for theorems and proof but sometimes made mistakes in their proofs" seemed fanciful. But today, that's exactly what LLMs could do if let loose on mathematics, and I do think this could in principle endanger the scientific enterprise. Aside from the question of dealing with volume, it's not reasonable to expect human referees to catch "random" errors of the sort that LLMs will make. Human referees can't even reliably catch non-random errors, but that doesn't make mathematics collapse because, as I said, the humans writing the proofs usually have a correct idea in mind even if their proof was wrong.
At the time, the idea of "computers that mindlessly searched for theorems and proof but sometimes made mistakes in their proofs" seemed fanciful. But today, that's exactly what LLMs could do if let loose on mathematics, and I do think this could in principle endanger the scientific enterprise.
Not only could they do it, they do do it.
I'm speaking here from my experience as a moderator at MathOverflow. (As some of you know, I retired myself in September from that position, but @David Michael Roberts is one of the current moderators and will probably be able to speak to the current situation.) It wasn't even so much that LLMs would make mistakes in proofs; it was more that they spewed out garbage and nonsense when it comes to the high-level stuff. Because, after all, they don't actually understand what they're talking about. And it became a pestilence. Maybe it was the fault of MO users who queried ChatGPT about MO questions and then posted the output under their accounts, for not knowing how to use ChatGPT more skillfully; I couldn't say really. What it can say is that it drove moderators like me, and users who want to keep MO reliable and high-level, nuts (and it had something to do with my stepping down, frankly).
So what Mike is speculating is actually dangerously close to the truth of the matter.
Todd Trimble said:
What it can say is that it drove moderators like me, and users who want to keep MO reliable and high-level, nuts (and it had something to do with my stepping down, frankly).
I feel sorry for you and all the other moderators...
How is MO handling the issue? Is it just relying on moderators shoulders?
Not at all -- there are a lot of MO active users who help keep the place clean by flagging for attention actions that are problematic.
High reputation users have soft moderation tools (vote to close, flag-as-spam, for instance) and since their expertise is broader than the small group of elected mods, this is a big help, as they can spot nonsense in what to a non-specialist is entirely plausible symbol salad.
Since we enforced registration to participate the amount of AI nonsense has definitely gone down. (We were also getting AI questions, not just answers...)
Jean-Baptiste Vienney said:
So if I understand an example of a paper which does not advance the understanding of the community could be a paper which states only 'easy' results. But if it's too easy, then the reviewers could say in their reviews that the paper should not be accepted?
This ignores the point I made about the increasing volume of papers that reviewers have to read. They could say it's not good enough, but they have less time per papier on average to make such a judgement, and as Mike explained, less experience catching the type of error that an AI trained to write plausible-sounding papers will make. In the worst case scenario, it's possible for junk to pass this filter (again, this has already happened and been reported on in several fields).
Even if it's formally correct and a real person is taking credit for it, what is the value? A well-meaning person might communicate this work to the community in other ways, in which case there is some potential value, but someone who is producing AI to generate papers to artifically boost their reputation (the direct equivalent of what Todd was reporting on MO) would be disincentivised from drawing more than superficial attention to such work in case their false authorship were to be revealed, undermining that reputation.
someone who is producing AI to generate papers to artifically boost their reputation
And that too has been happening, of course. I'm thinking of a particular case that David R and I know from MO, now currently under a very long ban.
Of course, all this is peanuts compared to the criminal use of AI in insurance (I suppose all of you have heard about the United Health Care case and the targeted assassination of its CEO). I can't think of words strong enough to describe how shameful and wicked all this is.
Todd Trimble said:
Of course, all this is peanuts compared to the criminal use of AI in insurance (I suppose all of you have heard about the United Health Care case and the targeted assassination of its CEO).
I haven't, and the part of the article you shared that is available for free doesn't mention AI; what's the story? (Briefly, so as to not get too off topic :wink:)
It doesn't exactly fall under this topic, but I just read this Nature article about academic publishers licensing their content to train gen-ai models, and thought it would be good to share it here.
Indeed, a most unsavoury can of worms