You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
I hate to do this, but the four arXiv papers by this author, uploaded in the last week, all look wholly AI-generated to me: https://arxiv.org/search/?query=Reizi&searchtype=author
In particular the appendix of https://arxiv.org/abs/2503.16555 there is a promise of proofs, examples and so on, but the entire text of the appendix is this:
In this appendix, we present additional proofs, detailed calculations, and further examples
that complement the results in the main text. In particular, the appendix includes:
* A complete proof of the back-and-forth construction used in Lemma 5.8.
* Detailed verifications of the functoriality of the Henkin and compactness-based model constructions.
* Concrete examples illustrating the construction of models for specific theories.These supplementary materials are provided to offer deeper insight into the technical details and to demonstrate how our unified framework can be applied to various logical systems.
The next text is the bibliography and that's it. The content is also extremely banal.
After a cursory inspection of https://arxiv.org/abs/2503.16570, I agree.
I can't find any information about this supposed person online except an affiliation via their email, but I've made a report to the Arxiv.
yep, no way a human wrote this
Stupid LLM forgetting the syntax for bold in TeX and falling back on Markdown...
I'm proud to say I called bullshit from the titles alone in my feed lol glad I wasn't wrong
Heh, we did an experiment on LLMs that produce SQL code, and for many of them, no matter how much you tell them not to format the output, they still do it. Stripping extra comments and markdown/html out of responses turned out to be the hardest part of interacting with the LLM in an automated flow.
Matteo Capucci (he/him) said:
I'm proud to say I called bullshit from the titles alone in my feed lol glad I wasn't wrong
Right, natural transformations between theorems.
I noticed there are two orders of the names used. Two of the papers are JRB, and two are BJR. What could be the point of that?
The email address seems to be attached to Open University Japan, so name-order may have been auto-generated differently for the different papers?
fosco said:
yep, no way a human wrote this
To be fair, I have seen researchers who just learned about category theory writing this way.
Anyway, the AI-generated slop CT papers are coming. I've noticed that Qwen 2.5 is trained on a lot of higher/formal category theory. It's fun to play with and it can produce approximately accurate references to results, which can sometimes cut down on search time. It's not yet good enough to generate any meaningfully creative results, and is not enough to fool a half-keen eye, but I can imagine an undergrad using qwen to write a undergrad thesis that nobody reads.
Noah Chrein said:
fosco said:
yep, no way a human wrote this
To be fair, I have seen researchers who just learned about category theory writing this way.
Anyway, the AI-generated slop CT papers are coming. I've noticed that Qwen 2.5 is trained on a lot of higher/formal category theory. It's fun to play with and it can produce approximately accurate references to results, which can sometimes cut down on search time. It's not yet good enough to generate any meaningfully creative results, and is not enough to fool a half-keen eye, but I can imagine an undergrad using qwen to write a undergrad thesis that nobody reads.
What is Qwen and how happen it was trained on so much category theory?
Qwen appears to be Alibaba's language model. I hadn't heard of it till now.
Perhaps the Chinese understand the importance of category theory to mathematics and hence to generalized cognition
There’s an interesting fake paper on the ArXiv today. I can’t really tell if it’s AI crankery or just the old fashioned kind. Did anybody glance at it? https://arxiv.org/abs/2505.22558
Kevin Carlson said:
There’s an interesting fake paper on the ArXiv today. I can’t really tell if it’s AI crankery or just the old fashioned kind. Did anybody glance at it? https://arxiv.org/abs/2505.22558
The excessive use of lists suggests AI
Right, that makes sense. It was harder to find obvious local absurdities than in papers further up this thread, which is disappointing.
There's a whole bunch recently that I have been complaining about pointing out elsewhere. The author is uploading a new paper every couple of days, and the title names something after himself. I'm happy to see today that they've been moved to math.GM ! (as I suggested)
https://export.arxiv.org/find/math/1/au:+Alpay_F/0/1/0/all/0/1
And in the case at the top of the thread, namely https://arxiv.org/search/?query=Reizi&searchtype=author all these are also math.GM classified now, not math.CT.
Seems like in theory the arXiv "endorsement system" should deal with AI generated papers just like any other spam, but I guess it doesn't work in practice? https://info.arxiv.org/help/endorsement.html
Yes, I'm a bit confused how all these people are getting endorsements.
At the very least it should be possible to "un-endorse" them after they've demonstrated their crankiness.
Another one! https://arxiv.org/abs/2505.22931
Maybe the arXiv needs to appoint a category theorist to the team of moderators...
I thought arXiv had a strong stance against crackpottery, so why are these papers allowed to remain under math.GM, rather than being removed entirely?
Kevin Carlson said:
Right, that makes sense. It was harder to find obvious local absurdities than in papers further up this thread, which is disappointing.
The phrase "discrete conformal field theory" in the abstract made me raise my eyebrows. As if that were a known thing. Given how much people try everything, there probably is some work on something called discrete conformal field theory, but....
Yeah, there's a paper Conformal Field Theory at the Lattice Level: Discrete Complex Analysis and Virasoro Structure trying to understand how conformal field theory is related to field theory on a lattice. But most conformal transformations don't map a lattice to itself, so this is bound to be rough, and the idea that "Recursive Difference Categories and Topos-Theoretic Universality" would have something to say about it is, umm, questionable.
Nathanael Arkor said:
I thought arXiv had a strong stance against crackpottery, so why are these papers allowed to remain under math.GM, rather than being removed entirely?
It can be hard to tell whether a math paper is crazy, and people whose papers are rejected entirely complain a lot, so it seems the arXiv folks find it convenient to put borderline papers into math.GM, expecting people 'in the know' to beware of such papers. That's my impression anyway.
It's more diplomatic than having math.CP for crackpot math.
This is a truly beautiful era to witness first-hand.
ViXra appears to be embracing the future...
But not unreservedly:
viXra.org only accept scholarly articles written without AI assistance. Please go to ai.viXra.org to submit new scholarly article written with AI assistance.
arxiv could use the exact same disclaimer, only changing the first instance of "vixra".
ai.viXra.org sounds like a fascinating crackpot sociology experiment. They have 343 papers so far. Within the subject of physics, most of the papers are on "relativity and cosmology", so we can guess that part of physics attracts crackpots the most. Within mathematics, 75% of the papers are on number theory.
Yesterday's first submitted paper on general relativity and cosmology:
The Pi-Periodic 22/7ths Dimension: A Quantum Gravity Framework for Dark Energy
We propose a novel 4+1-dimensional quantum gravity framework incorporating a compactified extra dimension, τ , with a periodicity of π (to 22 decimal places), symbolically tied to the rational approximation 22/7.
Someone is taking this 22/7 stuff very seriously! I believe Archimedes came up with this approximation to pi, and it was good enough that by the Middle Ages a bunch of mathematicians believed 22/7.
by the Middle Ages a bunch of mathematicians believed 22/7.
:surprise: Wait, is it not? /s
Archimedes squared the circle with this ONE WEIRD TRICK! Geometers hate him!
Actually I learned this when reading about the mathematician Franco of Liège. In 1020 he got interested in the ancient Greek problem of squaring the circle. But since he believed that pi is 22/7, he started studying the square root of 22/7. I don't know if he figured out how to construct the square root of 22/7 with straightedge and compass. But he did manage to prove that the square root of 22/7 is irrational!
Now, this is better than it sounds, because I believe the old Greek proof that is irrational had been lost in western Europe at this time. So it took some serious ingenuity.
Still, it's a sad reflection on the sorry state of mathematical knowledge in western Europe from around 500 AD to 1000 AD. It was better elsewhere at that time. I find this local collapse of civilization, and how people recovered, quite fascinating.
Could AI slop prompt some loss of collective intelligence now?
John Baez said:
Could AI slop prompt some loss of collective intelligence now?
In general any tool that helps you thinking makes you sloppier in some respect. So yes. For instance, ancient languages are often way more complicated grammatically than new languages. One reason for this is that being able to say "Go around the mammoth, without being heard, by exactly half of a circle" in fewer words may have been a big advantage when we were hunter-gatherers, so languages tended to be more expressive. With civilization, inception of written support etc we lost the need to formulate such complicated statements in a compact way, languages became less expressive, and we probably lost some of our cognitive ability in the process as well. It's always a tradeoff.
I'm thinking more about how successive generations of Roman summaries of Greek scientific texts watered them down to a homeopathic dilution of their original strength. Then many of the originals were lost, at least in western Europe.
@Fabrizio Romano Genovese: could you share a reference for the claim that older languages have higher entropy than modern languages?
13 messages were moved from this topic to #meta: off-topic > language: the rise and fall of complex grammars by John Baez.
Fabrizio Romano Genovese said:
John Baez said:
Could AI slop prompt some loss of collective intelligence now?
In general any tool that helps you thinking makes you sloppier in some respect. So yes. For instance, ancient languages are often way more complicated grammatically than new languages. One reason for this is that being able to say "Go around the mammoth, without being heard, by exactly half of a circle" in fewer words may have been a big advantage when we were hunter-gatherers, so languages tended to be more expressive. With civilization, inception of written support etc we lost the need to formulate such complicated statements in a compact way, languages became less expressive, and we probably lost some of our cognitive ability in the process as well. It's always a tradeoff.
uuuhmm what's a reference for this? smells really funny to me...
New AI paper up:
This one is funny because it outs itself
image.png
I actually approve of this way of approaching AI tools: personally, I don't think they automatically disqualify a paper. The principle should be the the author is ultimately responsible to check their results, and remains fully accountable.
Our results offer a formal justification for this procedure, suggesting that the analytic
continuation is not arbitrary but is in fact forced by the underlying principles of symmetry
and normalization.
Kind of a funny quote because the analytic continuation of a function is one of the most rigidly determined and least arbitrary constructions in mathematics
Indeed, the paper is (from a quick skim) likely formally correct but basically insubtantials, it's a big cargo-cult regurgitation. The whole thing seems circular.
Matteo Capucci (he/him) said:
I actually approve of this way of approaching AI tools: personally, I don't think they automatically disqualify a paper. The principle should be the the author is ultimately responsible to check their results, and remains fully accountable.
Yeah, I think I agree. At the very least we might have to get used to seeing that writing style everywhere, I can imagine a non-native speaker feeling a lot of pressure to use it to make their wording seem natural. It doesn't inherently disqualify the paper. But, on the other hand, it makes me suspicious and vigilant of errors, and at that point even a small error would be enough to cause me to discard it.
I suspect at some point someone will try to mass-produce papers and submit them everywhere (it's almost trivial to compile a list of journal inside the math-cs area; let the machine prepare a different paper for each item of the list; let the machine submit, let the machine handle the rebuttals and modify the paper accordingly, resubmit...), relying on small probability of success after a high number of trial.
It's the academia equivalent of asking out 100 girls, one of them will say yes.
These are very interesting times to witness. Especially if you're an irredeemable nihilist.
I think there's a real risk of the image of CT being tarnished if this type of stuff becomes too common. The number theory people know how to funnel cranks away from their arXiv category, if category theorists can't do this, it's not a good look.
Also, mathematicians generally are conscious of the circle squarers and the number theory cranks and so on, and can spot this stuff pretty easily, because it's on a hot-button topic and shows the usual obvious signs. But something in category theory applied to other areas (not Applied Category Theory, but to an outsider it's not necessarily obvious) that plays to the stereotypes of CT's abstract nonsense moniker just looks like another silly CT paper that claims to revolutionise our understanding of a piece of classical mathematics when really it's empty of real content.
Perhaps not among hardcore mathematicians, who would almost surely recognise the problem and commiserate, but anyone merely adjacent, for instance someone with money who might be needed to be convinced to fund some real and good ACT may get wind of this AI nonsense.
Maybe I'm being too pessimistic here. But these are ideas that occur to me
The number theory people know how to funnel cranks away from their arXiv category
How?
Well, I get math.NT daily announcements and I've never seen a crank number theory paper, and yet I know they do turn up in math.GM.
So somehow they manage it.
Have we taken any action about these papers? Contacted anyone at arXiv about removing them and un-endorsing the submitters? That seems to me to be the obvious first step. I'd be willing to help if needed, although I don't have the time to filter the daily submissions for them myself.
David Michael Roberts said:
But something in category theory applied to other areas (not Applied Category Theory, but to an outsider it's not necessarily obvious) that plays to the stereotypes of CT's abstract nonsense moniker just looks like another silly CT paper that claims to revolutionise our understanding of a piece of classical mathematics when really it's empty of real content.
I wonder if an effect like this could be what's causing the problem by making it easier for cranks to get endorsed with CT papers by lazy non-category-theorists.
David Michael Roberts said:
Well, I get math.NT daily announcements and I've never seen a crank number theory paper, and yet I know they do turn up in math.GM.
To me, that just suggests that the arXiv editors are better at detecting crank NT papers than crank CT papers, likely because they have had more practice at it.
Mike Shulman said:
Have we taken any action about these papers? Contacted anyone at arXiv about removing them and un-endorsing the submitters? That seems to me to be the obvious first step. I'd be willing to help if needed, although I don't have the time to filter the daily submissions for them myself.
I’ve contacted the ArXiv about the first batch of these that came up. They said they’d look into it but don’t share results of investigations. I haven’t checked whether the papers are down. It feels like fingers in a dike if we can’t figure out who is endorsing these authors though!
Did your first batch include Recursive Difference Categories and Topos-Theoretic Universality by Andreu Ballus Santacana? That was a crank paper discussed here earlier. It's still up! Santacana is also responsible for the new one you folks are talking about today, Analytic Uniqueness of Ball Volume Interpolation: Categorical Invariance and Universal Characterization.
I checked earlier, and Santacana appears to be in the department of philosophy of UAB Barcelona.
(He's definitely got the Grothendieck bald-head thing going on.)
I reported the papers of Barreto that David Roberts opened this thread with. Unfortunately they're still up and there have been two more since then. They're all in GM now, though, which I guess is the best it seems we can generally hope for.
The moving to math.GM has been patchy. Some of the ones by one author whose primary listing is CS.lo haven't moved, while those that were listed under math.CT have. Presumably because computer scientists are even less well-equipped than a generic mathematician to judge what CT is actually AI-generated crank material.
If the people submitting these things are actually employed by reputable institutions, perhaps we should contact their employers.
The first person I reported is unlocatable online, IIRC. But that’s apparently not the case for everyone.
David Michael Roberts said:
I think there's a real risk of the image of CT being tarnished if this type of stuff becomes too common. The number theory people know how to funnel cranks away from their arXiv category, if category theorists can't do this, it's not a good look.
Every week, one or two of these papers make it into the math.LO
/cs.LO
announcements, which is frankly ridiculous. We had a person who just had a couple of their articles GM-holed last week get through to cs.LO
again this week. Especially disappointing since at the same time, I know multiple people with solid academic affiliations, long records in logic, and academic email addresses who've seen their announcements blocked/delayed while they appealed (e.g. conference extended abstracts misclassified and rejected as "abstract-only submissions", or a PhD thesis randomly rejected) :/
I don't think this tarnishes the image of logic itself, but it's certainly a big source of noise and not a good look for arXiv moderation.
A possible solution is to set up a small website that collects these papers and flag them as "probably bollocks". A small number of us, committed to express a judgment evaluate these submissions pointing out "this passage is AI generated" "the second sentence at page 2 doesn't make any sense" etc
It takes a lot of work, but we all know what's the rule here:
image.png
I agree that this state of affairs tarnishes the reputation of category theory/ists and I think there is only one way to nip the problem in the bud, that is taking responsibility and vehemently assert that "yeah, no, we do not recognize this shit as category theory or even as decent math"
I'm not sure this kind of "negative curation", in which we maintain lists of things that are bad, is the way to go.
In an ideal world the function of journals is to be lists of things that are good, or at least probably not bad.
One option is to make sure all these dodgy AI-generated papers have comments on PubPeer. See eg https://www.pubpeer.com/publications/D52D1CC22593701472A83CFB9C2FD8 If the obvious red flags are documented here, then a list of links can be curated in a place category theorists have control over, or sent to arXiv admins, or employers of people making this nonsense.
Chad Nester said:
I'm not sure this kind of "negative curation", in which we maintain lists of things that are bad, is the way to go.
History disagrees https://en.wikipedia.org/wiki/Index_Librorum_Prohibitorum lists of things that are bad can be used to repress heresy.
fosco said:
Chad Nester said:
I'm not sure this kind of "negative curation", in which we maintain lists of things that are bad, is the way to go.
History disagrees https://en.wikipedia.org/wiki/Index_Librorum_Prohibitorum lists of things that are bad can be used to repress heresy.
Index Paperorum Crackpoti
fosco said:
Chad Nester said:
I'm not sure this kind of "negative curation", in which we maintain lists of things that are bad, is the way to go.
History disagrees https://en.wikipedia.org/wiki/Index_Librorum_Prohibitorum lists of things that are bad can be used to repress heresy.
An inquisition would, at least, be entertaining :)
A new paper in quant.ph supposedly connecting modular tensor categories to quantum contextuality smells like LLM to me.
Pages 7 and 8 definitely look like over-optimistic generalities of dot points. And the 'proof' here is altogether lacking in convincing detail in the last two sentences....
Proposition 4.3. The braid group representation derived from the Fibonacci category violates the KCBS inequality maximally, demonstrating strong contextuality intrinsic to its topological structure.
Proof. Projectors onto fusion basis states corresponding to the object generate a measurement scenario isomorphic to the pentagon graph underlying the KCBS inequality [9, 11]. The noncommuting braid generators create measurement contexts whose statistical correlations surpass classical bounds. Numerical evaluation of expectation values using explicit ρF matrices confirms maximal violation
It's just verbiage, with the convenient out that "numerical evaluation" will bear out the claim. In other words, bullshit.
Another AI-slop paper from Reizi:
https://arxiv.org/abs/2506.21653
Primary subject this time math.LO not math.CT. Also, name changed from Barreto Joaquim Reiz to Higuchi Joaquim Reizi. The formatting is also weirdly broken, with line numbers appearing inconsistently...
Identical submission email, this person needs to be put on a special watch-list.
Have you considered emailing folks at the arXiv, where these suggestions can have some effect?
I'm working on that, too, through a more senior person in the logic community, behind the scenes. I'm just cataloguing them here for the benefit of people who might see it and waste the time looking at it (though this one is pretty blatant). I can stop if it's too much noise.
It's not too much noise; people can always mute this thread if they want. I'm just glad you're trying to actually do something about this.
Someone pointed me at the arXiv moderation contact form, I put my case to them.
Thanks, yes that's an easy way to contact the moderators. Since you don't sound like a crackpot, they should take you seriously, though action may be slow, and almost surely near-silent.
E.g. I asked them whether they had an international backup of the arXiv, and they never replied, but now they have one.
(I'm not claiming I caused this, but it was an obvious thing to want so I'm glad they have it now.)
Number of days since an AI-generated slop article made it to math.LO
: zero. Again.
Yes. Also a repeat offender. But also it's going to math.FA, and not even cross-listed to math.CT, despite the title and the topic.
I encourage people to use the 'arXiv moderation user support' contact link here: https://arxiv-org.atlassian.net/servicedesk/customer/portal/2 and let the moderators know about the suspect paper(s) that turn up. Be specific in your report as to what makes you think it is LLM-generated, with examples from the paper that no human would write, if possible.
Zoltan A. Kocsis (Z.A.K.) said:
Number of days since an AI-generated slop article made it to
math.LO
: zero. Again.
"Submitted to Inventiones Mathematicae"
LOL
It seems like the author is picking a different primary classification for every paper
First one was math.CT, second cs.LO, third math.RT, fourth math.FA
Looks like it could be a conscious effort to avoid moderation
Zoltan A. Kocsis (Z.A.K.) said:
Number of days since an AI-generated slop article made it to
math.LO
: zero. Again.
And again zero :(
I wonder what could be a community Plan B if the arXiv moderators are unable to cope with this wave and the arXiv becomes viXraised. ArXiv overlays with additional community filters?
In particular, it doesn't look like arXiv's "get endorsement or academic-email" system can be tightened any further without causing undue difficulty to regular academics who want to post genuine preprints.
@Zoltan A. Kocsis (Z.A.K.) Is this about "systemic contraints"? ;-) The one graphic in that paper looks like a typical not-that-good LLM trying to make a technical image.
uhm would a webpage with a list of 'suspected slop' preprints be too strong of a reaction to this phenomenon? I'd be willing to setup that, and have people submit me entries
"The Wall of Shame"
Sounds good to me @Matteo Capucci (he/him)
By the way, it's wise to be quite polite and cautious in your public description of this web page, to reduce your chance of getting sued and/or harrassed. Having gotten threats from people I criticized publicly, I can assure you it's not much fun.
If it's not wholly AI-generated, it's at least got lots of LLM fingerprints all over the formatting and structure https://arxiv.org/abs/2507.04089
https://arxiv.org/search/math?searchtype=author&query=Hajebi,+P :-(
Not an AI-generated paper, but I just stumble on a AI-generated blog article about fibrations. (At the very least, the website is quite honest since it reveals that the author is Llama-4)
It's amusing how the definition of fibration is wrong. Also funny how the grammar is frequently wrong in the same way:
Fibration is a fundamental concept
Relationship between Fibration and Other Category Theory Concepts
Fibration is closely related to
etc.
Of course I have to be amused, because otherwise I'd break down and cry about how the pool of human knowledge is getting contaminated by sludge like this.
:poop:
I would guess the places where grammar is funny is where the human used search+replace to prepare the prompts to write the pages. This is all mass produced, it's quite harrowing.
I'd like to think all this AI slop will drive the value of expertise upward; after all, now the world not only needs John to write blog posts, but to correct AI generated slop posts too.
If the world needs me to do that, the world is in deep trouble.
John Baez said:
Of course I have to be amused, because otherwise I'd break down and cry about how the pool of human knowledge is getting contaminated by sludge like this.
This prompted me to wonder (somewhat fancifully) whether we could create a parallel humans-only Internet. Then we could cede the current Internet to the AIs, who would eventually implode due to model collapse.
At the very least, I'm thinking seriously about not posting new preprints on the arXiv any more, or any other public site from which they could be scraped to train AIs to generate mathematical-sounding slop. Surely there'd still be some way to make them freely accessible to humans.
And I suppose the same should apply to blog posts.
You have to both figure out some this-is-a-real-human verification method, which probably means some kind of biometrics, and also some method of preventing anybody from just downloading the human-only Internet and feeding it to the bots offline...The first one is solvable but invasive but I'm really not sure how to do the second. DRM for every file on your Internet? Bleh
Mike Shulman said:
At the very least, I'm thinking seriously about not posting new preprints on the arXiv any more, or any other public site from which they could be scraped to train AIs to generate mathematical-sounding slop. Surely there'd still be some way to make them freely accessible to humans.
I don't understand how this helps anyone? AI will still be used to output nonsense whether or not it is trained on your specific papers. It will also be trained on any papers you publish in journals in any case. All you would be doing is making it harder for humans to access your articles.
Kevin Carlson said:
You have to both figure out some this-is-a-real-human verification method, which probably means some kind of biometrics, and also some method of preventing anybody from just downloading the human-only Internet and feeding it to the bots offline...The first one is solvable but invasive but I'm really not sure how to do the second. DRM for every file on your Internet? Bleh
I don't know for the second problem. The first problem (distinguishing humans from bots) is indeed already an issue. (e.g., if I remember correctly, Facebook removes billions fake accounts every year). Some people in the cryptography/privacy world work on that, something along the lines of proving that you hold a state-issued ID card without revealing the details (using zero-knowledge proofs).
I guess this will probably trigger an arms race between "human-detectors" and "human-provers".
Kevin Carlson said:
also some method of preventing anybody from just downloading the human-only Internet and feeding it to the bots offline
I was imagining that human users would be constantly verified (however that would work) whenever they access an individual document, so they couldn't just log in once and then click "download the Internet" and get it all.
I did say it was fanciful. But if the alternative is ceding the Internet to the AIs and having nothing to replace it with, maybe we should be working harder on it.
Yes, I'm pretty sympathetic.
Graham Manuell said:
AI will still be used to output nonsense whether or not it is trained on your specific papers.
It's like voting, or reducing your carbon footprint. Anything any individual person does has a miniscule effect on the world, but the world is made up of individuals, so we should all follow the categorical imperative.
But I suppose you're right that currently we have no technical solution for disseminating information to humans only, and if we want to stay in this job we have to disseminate our research in some way. I guess we can hope that the pending copyright lawsuits against AI trainers bear some fruit...
There is another line of research focusing on "voluntarily poisoning" your data, so that an AI trained on a dataset including your data could be flagged. See e.g. this paper.
That's a nice idea. I don't suppose it's possible for those of us who don't work with data?
Mike Shulman said:
Graham Manuell said:
AI will still be used to output nonsense whether or not it is trained on your specific papers.
It's like voting, or reducing your carbon footprint. Anything any individual person does has a miniscule effect on the world, but the world is made up of individuals, so we should all follow the categorical imperative.
That's why we do category theory! :upside_down:
I do a lot of things that seem a bit quixotic in that they have a miniscule effect. But I wouldn't stop posting my papers to the arXiv because the positive effect of spreading my ideas to more human mathematicians seems to grossly outweigh the negative effect due to AIs reading them.
One slightly quixotic act I've been enjoying is not using Google and instead paying to use a search engine called Kagi which doesn't show me advertisements, doesn't rely on ad revenue, isn't as susceptible to search engine optimization tricks, doesn't push AI on me, and has a greater variety of search filters, like an "academic" filter. I heard about this from Cory Doctorow, who has very interesting things to say about enshittification (a term he invented).
I have little hope that the deluge of AI-generated content will abate. However, it seems more realistic to me that a human verification process could be used to whitelist authors who are known not to be bad actors. arXiv's current verification process is currently entirely insufficient, but I feel it can be addressed if arXiv actually take action. However, a severe disadvantage of this is that it makes academia even less accessible to those outside of it than it already is (because most likely the only entrypoint to verification would be via a verified user/institution).
In some sense, I feel it doesn't matter if huge amounts of AI slop is generated (distasteful as it is), so long as it's possible to filter out. In this case, I think that only permitting trusted users to post research is more important than only permitting trusted users to view research.
John Baez said:
One slightly quixotic act I've been enjoying is not using Google and instead paying to use a search engine called Kagi which doesn't show me advertisements, doesn't rely on ad revenue, isn't as susceptible to search engine optimization tricks, doesn't push AI on me, and has a greater variety of search filters, like an "academic" filter.
That's very interesting! I see that they do also supply an AI, which seems to contradict their goal of "humanizing the web", but I gather from your remarks that you can turn it off. How does Kagi compare to Google with udm14, which disables ads and AI?
Mike Shulman said:
That's a nice idea. I don't suppose it's possible for those of us who don't work with data?
In principle, the idea works on a vectorial representation of the data, thus should be applicable to text. However, text is more complicated in practice because the mapping "text vector" is less flexible than, e.g., "image vector", so the poison is harder to craft.
Also, since an individual author only provides a "few" samples, I don't know how relevant the technique can be for individual usage. I expect journals or any document archive to be more likely to be "clients" of this approach. Anyway, this is still research, so there is no off-the-shelf software/service available for now, as far as I know.
Mike Shulman said:
That's very interesting! I see that they do also supply an AI, which seems to contradict their goal of "humanizing the web", but I gather from your remarks that you can turn it off.
I must have turned it off immediately, because I never see it.
How does Kagi compare to Google with udm14, which disables ads and AI?
I'll have to compare them for a while. So far they look comparable except udm14 doesn't have those various filter settings. Kagi claims to have boolean search but it seems to be working erratically - maybe I'm not using it right. It also claims you can search by "least relevant first", which is hilarious.
Here's Cory Doctorow on Google and related things. (It's long, but folks can skip it if they don't care.)
That's where Ardoline and Lenzo's work comes in. They both document the ways in which we turn these online services into cognitive prostheses, and then investigate how the enshittification of these services ends up making us stupider, by taking away the stuff that helps us think. They're drawing a line between platform decay and cognitive decay.
The authors look at examples like the enshittification of Google Search, a product that Google has deliberately and irretrievably enshittified:
https://pluralistic.net/2024/04/24/naming-names/#prabhakar-raghavan
The web is a giant cognitive prosthesis, and early web tools put a lot of emphasis on things like bookmark management and local caching, so that the knowledge and cognition you externalized to the web were under your control. But Google Search was so goddamned magic – before they cynically destroyed it – that a lot of us switched from "not remembering things because you have a bookmark that takes you to a website that remembers it for you" to "not remembering things and not remembering where to find them, and just typing queries into Google." The collapse of Google into a giant pile of shit is like giving every web user a traumatic brain injury.
It's a good paper, but I think the situation is actually more dire than the paper makes it out to be, thanks to the AI bubble –
Wait! I'm not actually going to talk about what AI can do (which is a combination of a small set of boring useful things, a bunch of novelties, and a long list of things that AI can't do but is being used to do anyway). I'm talking about the financial fraud that AI serves.
Tech companies must be perceived as growing, because when a company is growing, it is valued far more highly than a company is once it has "matured." This is called the "price to earnings ratio" – the number of dollars investors are willing to pay for the company compared to the number of dollars a company is bringing in. So long as a company is growing, the PE ratio is very high, and this helps the company to actually grow. That's because the shares in growing companies are highly liquid, and can be traded for equity in other companies and/or the labor of key employees, meaning that growth companies can almost always outbid their mature counterparts when it comes to expanding through acquisition and hiring. That means that while a company is growing, its PE ratio can help it keep growing.
But here's the corollary: when a growth company stops growing, its shares are suddenly and violently revalued as though they were shares in a mature company, which tanks the personal net worth of the company's top managers and key employees (whose portfolios are stuffed with their employer's now-plummeting stock). Worse: in order to retain those employees and hire more (or to acquire key companies), the no-longer-growing company has to pay with cash, which is much harder to get than its own shares. Even worse: they have to bid against growing companies.
A growth company is like an airplane that has two modes: climbing and nose-diving, and while it's easy to go from climbing to crashing, it's much harder to go the other way. Ironically, the moment at which a company's growth is most likely to stall is right after its greatest triumph: after a company conquers its market, it has nowhere else to go. Google's got a 90% Search market-share – how can it possibly grow Search?
It can't (just like Meta can't really grow social, and Microsoft can't grow office suites, etc), so it has to convince Wall Street that it has a shot at conquering some other market that the street perceives as unimaginably vast and thus capable of keeping the growth engine going. Tech has pulled a lot of sweaty tricks to create this impression, inflating bubbles like "pivot to video" and "metaverse" and "cryptocurrency," and now it's AI.
The problem is that AI just isn't very popular. People go out of their way to avoid AI products:
https://www.tandfonline.com/doi/full/10.1080/19368623.2024.2368040
For an AI-driven growth story to work, tech companies have to produce a stream of charts depicting lines that go up and to the right, reflecting some carefully chosen set of metrics demonstrating AI's increasing popularity. One way to produce these increasing trend-lines on demand is to replace all the most commonly used parts of a service that you love and rely on with buttons that summon an AI. This is the "fatfinger AI economy," a set of trendlines produced by bombarding people who graze their screens with a stray fingertip with a bunch of AI bullshit, so you can claim that your users are "engaging" with AI:
https://pluralistic.net/2025/05/02/kpis-off/#principal-agentic-ai-problem
It's a form of "twiddling" – changing how a service works on a per-user, per-interaction basis in order to shift value from the user to the company:
https://pluralistic.net/2023/02/19/twiddler/
Twiddling represents the big cognitive hazard from enshittification during the AI bubble: the parts of your UI that matter most to you are the parts that you use as vital cognitive prostheses. A product team whose KPI is "get users to tap on an AI button" is going to use the fine-grained data they have on your technological activities to preferentially target these UI elements that you rely on with AI boobytraps. You are too happy, so they are leaving money on the table, and they're coming for it.
This is a form of "attention rent": the companies are taxing your muscle-memory, forcing you to produce deceptive usage statistics at the price of either diverting your cognition from completing a task to hunt around for the button that banishes the AI and lets you get back to what you were doing; or to simply abandon that cognitive prosthesis:
https://pluralistic.net/2023/11/03/subprime-attention-rent-crisis/#euthanize-rentiers
It's true "engagement-hacking": not performing acts of dopamine manipulation; but rather, spying on your habitual usage of a digital tool in order to swap buttons around in order to get you to make a number go up. It's exploiting the fact that you engage with something useful and good to make it less useful and worse, because if you're too happy, some enshittifier is leaving money on the table.
I think there is a new danger in a kind of "crank singularity" happening.
I'll admit I am not an experienced crankologist with decades under my belt like Dr. Baez, but I've noticed there is a stark difference in two different classes of cranks. I often stop by reddit and view the local "alt-physics" subreddits. I refer to these as "crank aquariums".
The "lower cranks" are mostly mystical, don't know much math and talk about the typical "woo" topics. Consciousness collapses the wavefunction, sacred geometry, you know the drill.
The "higher cranks" often use very sophisticated math to prove "new theorems". But their math is PDE heavy and fundamentally brittle. They discover "new terms" that Maxwell and Schrodinger "forgot" in their equations. They never use things like category theory, homology, moduli spaces, Lie groups, etc. They only do very heavy analysis.
The problem is that with new LLMs, these two separate classes of cranks could merge into a new form of "hybridized super-crank" generating endless reams of "self-conscious quantum operator algebras" and "quasi-cosmic graviton quantum field theories" with actual PDEs that are potentially sophisticated enough to overwhelm hapless journal editors.
Maybe there could be an additional filter (Category Theory CAPCHA?) of some kind where people hoping to publish could go into front of an AI interviewer and answer randomly generated questions about group theory, topology, cohomology classes, functors and sheaves. If you pass you receive a badge of some kind (this used to be referred to as a "degree" I believe?) These kinds of topics are usually too abstract for cranks to actually understand, so it could be a "mental block".
Anyway, I'm partly spitballing here, but the true lesson to remember is that mathematics chases deep structure, whereas cranks may only imitate it shallowly. (I am serious about that, but I felt some comic relief is also in order here. I hope the mods won't exile me to the crank aquarium!)
I'll add my piece of comic relief. There's a certain resistance in flagging some content as AI generated slop, for a similar reason that people are very wary of flagging a text as plagiarism... No problem, I can do it :smiling_devil: I have a certain experience and it's very pleasurable for me to tell someone who deserves it "shut up and com me back when you know what a determinant is"
Ben Kaminsky said:
I think there is a new danger in a kind of "crank singularity" happening.
It's happening. I'm getting many more emails from cranks, who are mostly working with LLMs to develop their 'theories'. A quote from one of these emails:
"developed rigorously with the help of large language models"
:rolling_eyes:
The "higher cranks" often use very sophisticated math to prove "new theorems". But their math is PDE heavy and fundamentally brittle.
I haven't seen any cranks using very sophisticated math. Some pretend to do so. And I agree with you that the number is dramatically increasing now that fake math is easy to get from a LLM. Luckily anyone who really knows math can see this stuff is fake.
maybe this has been brought up before, but what about adding "community notes" to the arxiv, like how it works on twitter? That's how twitter was able to pass moderation responsibility onto the public at scale.
This is a dangerous plan, for several reasons that people noticed about 15 minutes after first thinking of this idea a couple of decades ago. Some still favor it, but the arXiv moderators aren't going to take those chances.
Note however that anyone can start their own "arXiv reviews".
just out of curiosity, why would it not work for the arxiv (what are those reasons?) if it does work for twitter (or maybe it doesn't work for twitter?)? the only thing I can think of off the top of my head is that there is too small of a "public of experts" for the community notes to be accurate
My impression is that it's at least questionable whether it works for twitter.
Although I don't use twitter myself so I can't say of personal experience.
I'm also curious. Though I'm no longer on X/Twitter, my impression is that community notes continued to function quite well even as the platform deteriorated in other respects, pretty reliably flagging false or misleading posts.
Imagine if everyone got to say shit about each other's papers on the arXiv. It would be a bloodbath. It would quickly degenerate into obscenities and lawsuits unless the comments were moderated. Some of those lawsuits would even target the arXiv itself. All of this is the last thing the arXiv moderators want. They don't have time for this.
To be successful, such an approach would need to carefully circumscribe the allowed claims for a community note. The allowed claims would not include random opinions about or reviews of papers. Rather, the point would be to use the community to reach consensus on matters of fact, such as:
Evan Patterson said:
such an approach would need to carefully circumscribe the allowed claims for a community note
To enforce that, all the comments would have to be moderated, right?
The arXiv staff might prefer to spend their limited time/energy/money on moderating papers rather than moderating comments on papers.
I suppose one approach to avoid moderating comments would be a form which allowed no freely written text, just yes/no answers to questions like "this paper is AI-generated". This would have its own problems.
John Baez said:
The arXiv staff might prefer to spend their limited time/energy/money on moderating papers
And apparently they don't even have enough time/energy/money to do a good enough job of that, which is what led to this whole conversation. (Not intended as a criticism of them, just an observation about lack of resources.)
I don't suppose any agency would be likely to award a grant to support work to keep AI-generated slop off of the arXiv.
here's how it works on twitter: The Community Notes algorithm publishes notes based on agreement from contributors who have a history of disagreeing.[21] Rather than based on majority rule,[34] the program's algorithm prioritizes notes that receive ratings from a "diverse range of perspectives".[28][35] For a note to be published, a contributor must first propose a note under a tweet.[21] The program assigns different values to contributors' ratings, categorising users with similar rating histories as a form of "opinion classification", determined by a vague alignment with the left and right-wing political spectrum. The bridging-based machine-learning algorithm requires ratings from both sides of the spectrum in order to publish notes, that can have the intended effect of decreasing interaction with such content.[35][36][37]
Contributors are volunteers with access to an interface from which they have the ability to monitor tweets and replies that may be misleading.[21][9][38] Notes in need of ratings by contributors are located under a "Needs your help" section of the interface. Other contributors then give their opinion on the usefulness of the note, identifying notes as "Helpful" or "Not Helpful".[21][39] The contributor gets points if their note is validated,[40][21] known as "Rating Impact", that reflects how helpful a contributors' ratings have been.[39][41][42] X users are able to vote on whether they find notes helpful or not,[18] but must apply to become contributors in order to write notes, the latter being restricted by "Rating Impact" as well as the Community Notes guidelines.[39][41]
OK, so how many people in the world do you think a) will have a legitmate informed opinion on a somewhat niche research (sub)field and b) will engage in the commenting process on arXiv papers? Twitter notes worked (or "worked") because of scale. If you have a hundred thousand people, a million people, engaging on a topic that doesn't require PhD-level education to understand even the words, then this type of approach might achieve some level of community consensus.
This reminds me of the “wisdom of the crowd” phenomenon!
maybe it would be enough for the experts to be more numerous than the crackpots, rather than needing huge numbers of experts? (hopefully there are more experts than crackpots, but I have no idea tbh). But I suppose the discussion is moot without the arxiv actually doing it.
https://www.daniellitt.com/blog/2025/7/17/arxiv-in-trouble
Ryan Wisnesky said:
maybe it would be enough for the experts to be more numerous than the crackpots, rather than needing huge numbers of experts? (hopefully there are more experts than crackpots, but I have no idea tbh). But I suppose the discussion is moot without the arxiv actually doing it.
I feel like almost definitionally there would have to be more crackpots than experts, owing to the relative difficulty in becoming either?
John Baez said:
Imagine if everyone got to say shit about each other's papers on the arXiv. It would be a bloodbath. It would quickly degenerate into obscenities and lawsuits unless the comments were moderated.
I think you're too pessimistic John, people tend to be decent 99% of the time, especially if their full name is on display. I agree there would be the need to moderate the remaining 1% though.
Ruby Khondaker (she/her) said:
I feel like almost definitionally there would have to be more crackpots than experts, owing to the relative difficulty in becoming either?
I assumed the only people allowed to play this game would be people who have been endorsed to write papers on the arXiv. In this population there are more experts than crackpots.... though not everyone is an expert on every topic, indeed quite the opposite.
If you let random passers-by evaluate arXiv papers, there will definitely be lots of crackpots and people with grudges and other unproductive motivations.
Yeah of course! And it'd be spamland very quickly...
I think the whole problem is addressed by fixing the endorsement system. If arXiv is unable to moderate by themselves (which appears to be the case), then they need to either hire more people, or ask for trusted volunteers.
So much of the academic publishing system already depends upon volunteers, it doesn't feel like a stretch to have people contributing to moderating arXiv. It'd be great to have it become a place of recorded scientific discussions re a piece of work, including errata, reviews, and comments, so that it can also be a starting point for journal reviews.
The arXiv does a lot of moderating, and it's sometimes too strict: the case of Phillip Helbig comes to mind:
I've had a paper shifted from the group theory section to combinatorics against my will. I think their problem with AI-generated papers is that they're not used to filtering out papers of this sort. Most crackpots write in a way that sends off a particular vibe, which is easy for experienced moderators to detect. But LLMs are different. I think one can learn to detect them, at least so far.
In comments awaiting moderation (heh) on Peter Woit's blog that pointed out Daniel Litt's blog post on the topic, I wrote
I would like to see anyone whose papers were deemed to be AI-generated have all their endorsements stripped, and they should need to get fresh endorsements, probably more than one.
Moreover, I would even go so far as to propose that anyone who endorses an AI-generated paper should have their endorser-status reset, so that they cannot immediately re-endorse the person who they originally endorsed, until they have submitted more papers as usual. It would be a bit of an incentive to actually look at the paper for fear of a relatively harmless removal of a privilege. Active researchers would get back to having endorser powers before too long...
John Baez said:
The arXiv does a lot of moderating, and it's sometimes too strict
Clearly not enough (though being sometimes too strict is another problem).
I don't think a sheer increase in quantity is the best solution, given that the arXiv has an approximately zero budget for doing this. Moderation needs to be focused on the key problems. Now that AI is a big problem, the moderators need to be pointed to AI-generated papers, and they need to learn to spot them. They do look at every paper, I believe.
I don't think a sheer increase in quantity is the best solution, given that the arXiv has an approximately zero budget for doing this.
That would be solved by having volunteers. It would not take many volunteers for each category, and I think this would be relatively easy to achieve, as I think many people would be willing to help.
And now we're back full circle to the question we started with: how do we contact the arXiv moderators and get them to do something?
Possibly someone could email moderators@arxiv.org (mentioned on https://info.arxiv.org/help/moderation/index.html) and see whether they can give any information?
One can just email them. The moderators are listed here and the people in charge are listed starting here.
They seem fairly quiet and secretive. People with problems report getting little response. When I emailed some of the leaders about the virtues of getting a backup hosted outside the US, they never replied. They did develop such a backup system. But I'm not claiming they did that in response to my email.
It probably helps a lot to contact someone in charge whom you know personally. I could contact Jacques Distler, for example. I'm betting he'll say they are already familiar with the AI problem.
I find the lack of any official statement from arXiv on the matter a little disappointing, if they are aware of it.
They tend to talk as little as possible.
I'm a smidge nervous that this is getting awfully close to the point of someone like this actually getting a grant from a place like Templeton.
John Baez said:
We contend that the result—autoequivalence with the Monster Group—is statistically improbable unless the theory holds validity. Consequently, AI consensus would not have been achieved erroneously.
My eyes are bleeding :skull::skull:
'What are the chances I lost the lottery with this very specific ticket? So low it must be the lottery which is wrong'
Yes, this would be hilarious if it were intended as a joke. I also like the misuse of "autoequivalence" - which usually means an equivalence of something with itself - in the claim that U-category theory (whatever that is) is autoequivalent with the Monster Group.
They tend to talk as little as possible.
[...] With its customary discretion, the Company did not reply directly; instead, it scrawled its brief argument in the rubble of a mask factory. [...]
Blimey, it looks like some overcomplicated topics for 1st year students.
Screenshot 2025-08-05 at 14.39.43.png
Here's a counter-point from twitter: "I don’t get why Arxiv containing slop is a bad thing. I mean sure it’s frustrating and annoying, but merely having been put on Arxiv should give a paper draft exactly zero additional credibility". I suppose I feel the same way; I always thought of the arxiv as simply a substitute for putting pdfs on a personal website, with endorsement meant to keep the arxiv from turning into a public API for storing PDFs as opposed to technical vetting.
arXiv is a tool for researchers. It becomes useless if there is absolutely no moderation. If anyone can upload whatever they like, you might as well refer to the Library of Babel instead.
"I don't see why this thing that two bad adjectives apply to is bad"
Besides being frustrating and annoying: Almost everything on arXiv is real research, which is a huge benefit for discoverability, and does actually mean that something being on arXiv increases its credibility far over "a random thing on the Internet." Both those values could be mostly destroyed by too high a slop ratio.
Yes, I too don't want all the pseudoscientific vomit in the world to be on the arXiv. That's called the internet.
When I search for papers with a given keyword on the arXiv, I want a majority of them to actually make sense! And they do, so far.
The arXiv is a tremendously useful tool, which I use several times a day, largely because of what's not on it.