Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: practice: communication

Topic: The AI "Scientist"


view this post on Zulip Morgan Rogers (he/him) (Nov 25 2024 at 16:31):

My wife sent me this, a paper heralding "the AI scientist", an ensemble of AI tools allegedly capable of automating the whole research ecosystem. For those on Bluesky, this has been critiqued admirably by Carl T. Bergstrom, but it is a sign that scientific communities need to move fast to avoid being submerged by a tidal wave of AI-generated junk papers. I know from direct experience that these are already showing up in isolation, but the construction of a system for rolling them out is an alarming development.

Since several organizers of conferences and editors of journals are present here, I wanted to launch a discussion of what we can collectively do to manage this trend.

view this post on Zulip Jean-Baptiste Vienney (Nov 25 2024 at 16:59):

Ok, it doesn’t work well now. But it is not impossible that AI will end up replacing researchers ahah (however ChatGPT says it will not happen tomorrow).

view this post on Zulip Jean-Baptiste Vienney (Nov 25 2024 at 17:09):

I would have liked to have more details about what Carl T. Bergstrom was thinking about when he wrote: "Given what I know of fundamental limits to what LLMs can do, I see no reason to agree."

view this post on Zulip Jean-Baptiste Vienney (Nov 25 2024 at 17:10):

What are these fundamental limits please?

view this post on Zulip Jean-Baptiste Vienney (Nov 25 2024 at 17:12):

I think nobody knows whether there are such fundamental limits: maybe there are, maybe there aren't.

view this post on Zulip Ryan Wisnesky (Nov 25 2024 at 17:18):

One way to understand LLM limitations and abilities is to actually examine their code. GPT2 is only 400 lines of Haskell (https://github.com/tensor-fusion/GPT-Haskell), and is also available as an excel spreadsheet (with training data) and a giant recursive SQL query. You can see for yourself, if the code is "thinking" or "averaging" (my favorite metaphor) or something else entirely. And then ask and answer math questions about the computational model, such as whether it can compute all functions, etc. I understand many LLM architectures cannot be 'Turing complete' for example, while others are.

view this post on Zulip Morgan Rogers (he/him) (Nov 25 2024 at 17:30):

The problem is not whether AI could eventually produce publication-worthy papers (putting aside the fact that some AI-written papers have been accepted to journals), it's that we do not have the human time resources or confidence in existing AI to review papers that can be produced with this technology at a much higher rate than human-written papers can be.

The aim of science is not to manufacture knowledge, it is to achieve understanding, and that by humans or (in the AI fantasy future) systems capable of leveraging that knowledge for our benefit; to me, the latter is certainly a lower priority. Having large volumes of imitation academic papers which we don't have confidence in the content of because we don't have experts with enough time to look at them will serve only to undermine the scientific process as a whole. Our jobs are on the line, not because we are going to be replaced but because our good-faith research output risks being drowned out by this firehose.

view this post on Zulip JR (Nov 25 2024 at 17:35):

Ryan Wisnesky said:

I understand many LLM architectures cannot be 'Turing complete' for example, while others are.

Very general communication complexity considerations (specifically, relating to the computation of the attention softmax) show that a single transformer attention layer cannot reliably compute large function composition queries. An example of a composition query is the concatenation of A, B, and Q, where A is of the form “Alice’s mother is Andrea; Bob’s mother is Betsy; …”, B is of the form “Andrea is a teacher; Betsy is a surgeon; …”, and Q is of the form “What is Alice’s mother’s job?” The linked paper shows that if A and B are large—in a way irrelevant to context windows—then a transformer cannot answer Q reliably. 

The same paper shows that the computation of a transformer with L layers on a prompt of length n can be performed using O(L log n) bits of memory under various polynomial scale assumptions. As a corollary, multi-layer transformers cannot solve the notoriously easy problems 2-SAT or Horn-SAT at scale unless L = NL.

On the other hand, the representational power of multi-head transformer architectures is bounded below by n-gram models, which can perform impressively in practice.

view this post on Zulip Jean-Baptiste Vienney (Nov 25 2024 at 17:38):

Morgan Rogers (he/him) said:

Having large volumes of imitation academic papers which we don't have confidence in the content of because we don't have experts with enough time to look at them will serve only to undermine the scientific process as a whole. Our jobs are on the line, not because we are going to be replaced but because our good-faith research output risks being drowned out by this firehose.

I think it will not be too difficult to identify whether a paper is submitted by a researcher or an AI. It suffices to use Google to find whether the name of the author corresponds to a real person or not. But to deal with the cases where the author is not much on the internet, and to save time, it could be helpful to obligate authors to use ORCID identifiers.

view this post on Zulip Peva Blanchard (Nov 25 2024 at 17:41):

It's going to be difficult. On the top of my head, I can only see band-aid police-like actions:

Maybe keep an eye on water marking technologies (although there are some "conflicts of interest" ...)
I am, unfortunately, not very optimistic for the mid-term: I bet there will be an arms race between generators vs detectors.

view this post on Zulip Morgan Rogers (he/him) (Nov 25 2024 at 17:43):

Jean-Baptiste Vienney said:

I think it will not be too difficult to identify whether a paper is submitted by a researcher or an AI.

The question is not of detecting them, but of putting a policy in place of what to do with them, and under what conditions; that's what I want to discuss.

view this post on Zulip Jean-Baptiste Vienney (Nov 25 2024 at 17:44):

Morgan Rogers (he/him) said:

The aim of science is not to manufacture knowledge, it is to achieve understanding

But what really is understanding? It is a vague notion. Knowing what is true or false or what are the answers to precise questions is sufficient.

view this post on Zulip Morgan Rogers (he/him) (Nov 25 2024 at 17:51):

Sufficient for what? I'm not interested in just "knowing what is true or false or what are the answers to precise questions". If all you learned at school was a series of statements and whether they were true or false, it must have been quite an unpleasant experience.

There are further significant ethical questions around plagiarism, considering that a number of medium-sized academic publishers are selling books and research papers to AI companies for their training data.

view this post on Zulip Jean-Baptiste Vienney (Nov 25 2024 at 18:07):

Morgan Rogers (he/him) said:

Sufficient for what? I'm not interested in just "knowing what is true or false or what are the answers to precise questions". If all you learned at school was a series of statements and whether they were true or false, it must have been quite an unpleasant experience.

I mean sufficient for practical purposes. If you can have answers to "What is a good plan to optimize the pleasantness of my day today while making my life as long as possible?", "Should I study this math question or this one? take a nap this afternoon? go for a walk at 10am? Please not be too precise and let me some choices to make in order that can I use my free will which makes me feel better" etc... what else do you want?

I agree that being able to ask questions and making analogies is also something very cool to learn.

view this post on Zulip Morgan Rogers (he/him) (Nov 25 2024 at 18:30):

It sounds like you already go to AI to get answers to those questions. They don't seem to be the kind of questions that AI could possibly answer scientifically or say definitively are "true or false" so they don't seem particularly relevant to the problems I want to discuss here.

view this post on Zulip Jean-Baptiste Vienney (Nov 25 2024 at 18:42):

I'm done. I let you discuss whatever is relevant to your problems. :)

view this post on Zulip David Egolf (Nov 25 2024 at 19:50):

It's an interesting question. I believe that AI-generated scientific papers are currently (mostly?) junk quality, but I also think it is somewhat likely that AI will be able to generate decent quality papers within the next decade. For me, the question of "How to manage a flood of junk papers?" is a relevant question. But I suspect that it may be replaced by a broader question perhaps within a decade: "How can humans remain the judges and managers of scientific knowledge?".

To speak to the specific topic raised, I think it might be helpful to draw an analogy to online gaming communities where computer programs imitating real players is already a huge problem. Player characters controlled by programs are usually called "bots", and it's an unending struggle to (1) detect and ban bots and (2) mitigate the impact of the bots that inevitably escape detection. Here are two measures that have - to my understanding - helped in a gaming context:

Analogous measures in the context of paper submission could include:

All of these measures have serious problems with them!

view this post on Zulip Joe Moeller (Nov 25 2024 at 21:01):

I hadn't even thought about bot authors being the problem. I assumed it would be real people using AI to pump out papers in their name, and then submitting it themselves. What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.

view this post on Zulip Peva Blanchard (Nov 25 2024 at 21:19):

Joe Moeller said:

What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.

I can imagine someone wanting to hack the "citation game" by making a lot of fake persons publish fake papers that cite your work.

Another is simply a dos attack: prevent a journal to publish to deteriorate its reputation. (I don't know the plausibility of this scenario, but who knows).

But your question raises, more or less, the question of defining the threat model.

view this post on Zulip David Michael Roberts (Nov 25 2024 at 21:22):

I assume the premise at the start of this thread is not autonomous bots generating pdfs that look like papers, but humans using a genAI-powered tool to rapidly generate such artefacts and also handle the submission process. So that the speed of nonsense generation can go up an order of magnitude. In some countries there are very strong incentives, even financial, to maximise certain metrics

https://mathstodon.xyz/@highergeometer/113541639662531977

and there are people who feel that is more important than other scholarly considerations.

view this post on Zulip David Michael Roberts (Nov 25 2024 at 21:24):

If the AI-generated text is nearly indistinguishable from solid work, it still needs to be refereed. Even if it is correct, it still needs to be refereed. Unless the AI tools can also actually referee papers too, which I suspect is even harder than generating them, then the problem I believe Morgan is highlighting is one of "humans can do the work, so that machines have time to think", to quote B(if)tek

view this post on Zulip David Michael Roberts (Nov 25 2024 at 21:29):

Trust in the literature is important to maintain, at least at or above current levels. This was something psychology had to grapple with a bit over a decade ago, with the replication crisis, and that was at regular human pace of generating flaky papers.

view this post on Zulip Joe Moeller (Nov 25 2024 at 21:33):

Peva Blanchard said:

Joe Moeller said:

What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.

I can imagine someone wanting to hack the "citation game" by making a lot of fake persons publish fake papers that cite your work.

Oh yeah, you could even do this without accosting a journal. You just have accost the arxiv, and google will update your h-index.

view this post on Zulip David Egolf (Nov 25 2024 at 21:43):

Joe Moeller said:

I hadn't even thought about bot authors being the problem. I assumed it would be real people using AI to pump out papers in their name, and then submitting it themselves. What benefit is gained by inventing a fake person and having them produce fake work? I must misunderstand something.

That's a good point! I was primarily intending to draw an analogy between botting and submission of AI-generated papers. And I think it can be interesting to contemplate such an analogy. That being said, I did drift a little bit in my post above - I started considering, at least to some extent, the problem of AI-generated papers being submitted under fake credentials. I don't think that's a big problem currently; perhaps it could become a larger problem in the future if AI systems start being given additional agency.

view this post on Zulip Peva Blanchard (Nov 25 2024 at 21:48):

I got curious and browsed Arxiv's blog with the keyword "chat gpt". I found one result announcing their updated content moderation policy.

Here is the excerpt.

Policy for authors’ use of generative AI language tools

view this post on Zulip David Michael Roberts (Nov 25 2024 at 23:27):

David Egolf said:

I started considering, at least to some extent, the problem of AI-generated papers being submitted under fake credentials. I don't think that's a big problem currently; perhaps it could become a larger problem in the future if AI systems start being given additional agency.

I think people would rather garner the credit/brownie points/citations/h-indices etc for themselves, rather than submit papers under a fake identity and get no benefit from whatever metrics purport to measure scholarly output.

view this post on Zulip John Baez (Nov 26 2024 at 01:40):

Yes, unless there are purely malicious actors who want to bring science to its knees by flooding journals with second-rate papers. Which is actually an interesting thought, at least good enough for a short story. But it's easier to see people cranking out piles of papers and submitting them to journals under their own name.

view this post on Zulip Peva Blanchard (Nov 26 2024 at 07:42):

With @Joe Moeller , we discussed above the case of a malicious actor (real human) making a lot of fake papers using fake names to cite its work. It is even possible to do that with the arxiv and have google update your h-index.

By the way, when imagining scenarios of flooding journals with second-rate papers, I don't think that they are very unlikely for some organizations. For instance, the "science" produced by, e.g., tobacco companies has been used to mute findings related to, e.g., the impact of tabagism on health. (other examples would include, I guess, oil companies and climate change, or more recently social networks and data rights). Gen-ai tools have lowered the bar to such coordinated manipulations.

view this post on Zulip Daniel Geisler (Nov 26 2024 at 08:14):

As a moderator I dealt with a series of odd postings from a new user who offered to answer any open question in mathematics. Instead of quickly figuring out what was going on, I ended up going down a deep rabbit hole. I began to get so many more strange posts from new users that it bogged down the entire moderator process. I suspect the user cases discussed here don't cover they variety of motivations in play.

view this post on Zulip Morgan Rogers (he/him) (Nov 26 2024 at 10:00):

It seems we have understood some of the reasons why this is very likely to become a problem, and for those of us in academia I again want to stress it is a problem that will directly affect us. So what do we do about it?
The policy @Peva Blanchard quoted seems sensible as a first step: oblige authors to disclose their use of GenAI. That is obviously hard to enforce if sufficiently obfuscated. For similar reasons, an individual policy of "I will not review AI-authored articles" is hard to apply consistently, although I am considering putting a statement to that effect on my personal webpage anyway.

An approach which might work is a "one strike policy" against mistakes that AI is known to make but which a human could only make with significant effort or deception. These include:

Checking any of these things beyond the requires significant effort.

view this post on Zulip Morgan Rogers (he/him) (Nov 26 2024 at 10:05):

Again, that's just at the base level of checking whether the papers are formally correct; none of that does much to staunch the potential volume or guarantee the scientific value of the papers produced (the latter being potentially very subjective).

view this post on Zulip Matteo Capucci (he/him) (Nov 26 2024 at 14:06):

Jean-Baptiste Vienney said:

I think it will not be too difficult to identify whether a paper is submitted by a researcher or an AI. It suffices to use Google to find whether the name of the author corresponds to a real person or not.

For the foresseable future, the risk is that real people will use AI to submit junk to journals/conferences, not AI agents. Also it is fairly easy to fake one's existence online... see bots on Twitter

view this post on Zulip Matteo Capucci (he/him) (Nov 26 2024 at 14:06):

Jean-Baptiste Vienney said:

it could be helpful to obligate authors to use ORCID identifiers.

This is not a bad idea, at least because one could softban authors that submit slop

view this post on Zulip Matteo Capucci (he/him) (Nov 26 2024 at 14:36):

Morgan Rogers (he/him) said:

oblige authors to disclose their use of GenAI

I think this is a fine rule but also probably practically useless. Either AI papers are SoTA crackpotty junk that a reviewer can spot by quickly skimming the paper, or they are significantly better at pretending to be substantial. In the first case, if someone is willing to submit such slop I doubt they would self-denounce AI usage, so you would end up reviewing these papers anyway. In the second case, the incentive to lie and pass the work as yours is much higher, and you would end up revieweing these papers anyway.

In any case, asking people to 'not do the bad thing' is usually quite useless to stop them from doing the bad thing. Instead, incentive and reputation systems can be used to shape behaviour, and academia has long relied (with varying results) on such systems. Citation metrics and networking are ways to measure and boost your reputation in a field, which is ultimately what makes your science relevant (if someone proves the abc conjecture in a forest and no one is there to hear, did they actually prove it?), and people are disincentivized from dabbling in scientific fraud because they know they are going to completely burn their reputation if caught.

So I think the solution to the rising tide of AI slop is to build reputation networks we can trust: have public lists of authors which are known to be fraudolent, make submissions conditional to a reference (like arXiv does), etc. These are not perfect systems and I'm just pointing in a general direction: we should be very careful to avoid gatekeeping real people from disadvantaged/nontraditional backgrounds. But you get the idea.

view this post on Zulip Matteo Capucci (he/him) (Nov 26 2024 at 14:37):

Morgan Rogers (he/him) said:

An approach which might work is a "one strike policy" against mistakes that AI is known to make but which a human could only make with significant effort or deception. These include:

Checking any of these things beyond the requires significant effort.

These seems to be excellent tasks for AI lol but techbros seems to be more preoccupied with generating bullshit than actually making helpful stuff (or if they do, they are way quieter about it than the bs generators)

view this post on Zulip Morgan Rogers (he/him) (Nov 26 2024 at 14:49):

A cynical observer might identify a pattern of creating a problem in order to build a market for the solution... :melting_face:

view this post on Zulip John Baez (Nov 26 2024 at 15:08):

Morgan Rogers (he/him) said:

An approach which might work is a "one strike policy" against mistakes that AI is known to make but which a human could only make with significant effort or deception. These include:

Humans are guilty of the last two quite often in my experience. Maybe a good policy would be fairly merciless against behavior that seems 'dishonest', regardless of whether a human is doing it or an AI, while being merciful to things that feel like 'honest errors'.

Checking any of these things beyond the requires significant effort.

Beyond the what?

view this post on Zulip Morgan Rogers (he/him) (Nov 26 2024 at 15:30):

beyond the first :big_smile: it's fairly easy to check if citations exist, compared with checking whether they contain a given piece of information!

view this post on Zulip Mike Shulman (Nov 26 2024 at 18:52):

Matteo Capucci (he/him) said:

asking people to 'not do the bad thing' is usually quite useless to stop them from doing the bad thing.

I don't think that's entirely true. In many cases it's important to state clearly what is considered to be bad, since some people may not want to do bad things, but may be unclear on whether something is bad or not. (Cf. academic integrity policies on syllabi.) Maybe that isn't relevant here, I don't know. But having a clear public statement of what bad behavior is is also helpful to justify punitive actions taken against those who engage in it.

view this post on Zulip Matteo Capucci (he/him) (Nov 27 2024 at 16:00):

Of course!

view this post on Zulip Morgan Rogers (he/him) (Nov 28 2024 at 08:14):

Mike Shulman said:

But having a clear public statement of what bad behavior is is also helpful to justify punitive actions taken against those who engage in it.

So where should we draw the line on what constitutes bad behaviour? Is it a matter of forbidding certain types of tools or limiting the volume/proportion of AI-generated material? Or are there other criteria to consider?

view this post on Zulip Chad Nester (Dec 04 2024 at 09:36):

I'd like to build on Morgan's earlier assertion that the aim of science is "not to manufacture knowledge, but to achieve understanding".

We might sharpen this a little to the assertion that the aim of science is "not to publish papers, but to achieve understanding".

This is relevant because the specific concerning thing that LLMs are capable of doing is generating research papers which are, at a glance, plausible. This is concerning largely because of the strain it might place on the formal peer-review system for publishing papers.

It seems to me that this is only a problem because of the many perverse incentives to publish as many papers as possible. One way to fix this problem is to change the way we evaluate scientific output. Currently, a significant part of the reason to publish papers is to accrue "science points", which are then used to make decisions about hiring etc.

I think making changes here would be a good idea anyway. I think our lives would be better, and the quality of our research higher, without the pressure to publish as many papers as possible.

So, it may be a good thing if AI completely destroys the publishing system. It might push us into the kind of collective action needed to reform the system into something less absurd.