Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: community: general

Topic: checking the work done by ChatGPT

Ben Sprott (Feb 01 2026 at 06:42):

I have talked to chatGPT about my work. I have asked for clarification from it on several subjects. I have made sure it doesn't pander to me. It has suggested structures to me. I can't check it. Would anybody be interested in checking the work it did?

James Deikun (Feb 01 2026 at 07:15):

I would like to second @Kevin Carlson and say that ChatGPT is currently incapable of doing research-level reasoning or writing without constant monitoring by an actual researcher with a good understanding of what it's trying to do and the means by which it's trying to do it. You can't just have it do a big lump of work that you don't know how to check and get it checked afterwards and expect the result to be anything but worthless.

Further, asking professional researchers to check "your" research that was done by ChatGPT is looking down on them, and even asking ChatGPT to do the things you ask of it is arrogant on your part. You're trying to get credit for being a "researcher", or perhaps trying to drive research agendas, when you're only actually doing what amounts to brainstorming, which is less than 10% of the work of research when you do it correctly, and way less than that if you don't understand enough to know when the ideas you come up with are any good. If you want to be a researcher, put in the work and become capable of understanding the concepts you're trying to use. There are plenty of university programs, and plenty of free learning materials for the autodidact. If you're not capable of doing that, there are many ways to contribute to the world other than research.

fosco (Feb 01 2026 at 08:09):

But what does it mean you "can't" check it? If that helps making the point James is making in the comment above, paste a piece of slop and let's see if a fraction of it is salvageable. It also depends what you're doing exactly. There's a good and a bad way of using the tool...

David Michael Roberts (Feb 01 2026 at 10:36):

If you are willing to pay people professional rates, perhaps, and I personally would set them high. There is, otherwise, zero incentive for people to read and critique apparently unguided LLM output (I.e. hallucinations)

John Baez (Feb 01 2026 at 18:29):

Surely there's someone out there who'd be willing to earn money by checking whether work done by ChatGPT is correct. The question then becomes whether you can trust that person. They may just feed it into an AI. :laughing:

Ben Sprott (Feb 01 2026 at 19:10):

I would love to offer hard cash for help. Hopefully that will be possible soon. It would be interesting to see if, when used honestly, the the robot can help out researchers who are not great with math to get their work up to a certain standard. Then getting it checked for a fee can be a final step for that researcher.

James Deikun (Feb 01 2026 at 19:30):

It's already been seen that the robot can't do that. One of the biggest problems with all LLMs so far, is that they don't have an internal concept of confidence to connect to how confident the text sounds, so they can only tune how confident the text sounds to external cues, like how confident the user seems to be in their results. The LLMs themselves are "not great with math", but they don't know it. It becomes a process of the blind leading the blind, and the results are not good.

James Deikun (Feb 01 2026 at 19:35):

The reasons people want money from you to check your papers are dual: 1) they think they will be wasting their time (the papers will not be salvageable) and 2) you are not offering the usual currency in which this kind of work is paid for, namely co-credit and contribution of the result to one's own research program. You won't be able to participate in research of any kind as an insider until you fix both of these problems.

David Egolf (he/him) (Feb 01 2026 at 19:54):

I can relate to the challenge of wanting to do interesting research, but struggling to know how to do it! In my experience, LLMs are not as helpful as one might hope for this purpose. Specifically, they tend to give very "zoomed out" conceptual ideas - and struggle to give precise details and examples that would help these vague ideas really come to life! They also tend to lose track of what exactly they were talking about, contradicting themselves in somewhat subtle ways in longer conversations. That quickly becomes a roadblock if one presses an LLM for details on some big picture idea it has.

If you would like to develop your research skills, I think learning from a human is still (by far!) the best way. Personally, I have found it quite helpful to ask questions relating to specific exercises in areas of interest on this zulip. Working through a specific problem in detail is a lot of work, and is less exciting in some ways than trying to dream up new research ideas. But I think the practice it gives with "zooming in" - seeing how abstract concepts can be realized in very specific examples/theorems/definitions - can be quite valuable for the goal of doing research in the future.

David Egolf (he/him) (Feb 01 2026 at 19:59):

To illustrate what I was saying above, consider this definition from the document "Full_Theory_2.pdf" you uploaded in another thread:

Definition 2.1 (Instrument category). An instrument category is a small category C whose objects are operational configurations of an apparatus and whose morphisms are interface-allowed structure-preserving transformations between configurations.

This sounds interesting but very vague to me! What is an "operational configuration"? What is an "apparatus"? What is an "interface"? I suspect people have done a lot of work studying possible answers for all of these questions, and to sharpen the focus of your work, you might consider digging into some of those details. (For example, you might consider trying to do this with the work mentioned by Kevin Carlson in that other thread, who said: "Spivak's work on polynomial functors is a well-developed categorical theory of devices and their interfaces that you might build on, rather than starting from scratch".)

If you try to do this, you may find that there is some specific challenging-to-understand concept/approach/calculation/theorem used in prior work, which you would like to understand. Then, you could try asking a specific question about that thing on this zulip. That might lead to an interesting conversation helpful to your goal of doing research!

John Baez (Feb 01 2026 at 21:13):

Yes, that's an example of a "Definition" that's not a mathematical definition until previous terms - "operational configuration", "apparatus", and "structure-preserving transformation between configurations" - have been given mathematical definitions... in a way that bottoms out in mathematical terms we can all look up on Wikipedia, like "category", "real number", "Hopf algebra", "compositum of fields", etc.

John Baez (Feb 01 2026 at 21:15):

I am finding Claude Opus 4.5 to be useful in research, e.g. after a long and detailed conversation it more or less volunteered to write a Python program to find 5-limit intervals that are as close to 1 as possible for a given Tenney height. (If you don't know the definitions of those terms, and you care, you can read them on my blog article about this program.) I realized later that the program is flawed, in that the search domain was not large enough to be sure that the answer is right. But Claude announced it was going to search that domain and I okayed it thinking that the resulting answer would be close to correct. I checked some of the results and they look reasonable. I should go ahead and fix the program to expand the search domain, but even now it was a valuable step forward.

I'm also finding it helpful to use Claude to learn mathematics - right now, the mathematics of regular tuning systems as developed by Gene Ward Smith, an algebraist I knew slightly, who died of COVID in 2021, whose writings on music theory are unfortunately unpublished, and scattered in a number of chat forums and wikis. I learned about 'Tenney height' from Claude, and now I'm learning about other concepts, and it's a lot of fun.

However, in my interaction with Claude I'm constantly questioning it when I don't completely understand something, or doubt something, and comparing its calculations to my own, and comparing its definitions to those I can find elsewhere. And I am not eager to have it make large-scale decisions about what to do.

Here's a sample interaction:

JB: Is a temperament the kind of surjective homomophism you mentioned, or merely its kernel, as I first guessed?

Claude: That's a good question, and I think I was being imprecise.

Two homomorphisms with the same kernel can differ by an automorphism of the target. If φ: ℤⁿ → ℤʳ and ψ: ℤⁿ → ℤʳ have the same kernel, then ψ = A ∘ φ for some A ∈ GL(r, ℤ).

In the regular temperament paradigm, I believe the standard usage is:

The temperament is the kernel (or equivalently, the equivalence class of homomorphisms sharing that kernel). This is what the wedgie captures invariantly.

A mapping is a specific homomorphism—a choice of how to express the temperament in terms of generators. The "standard icon" [⟨1 2 4|, ⟨0 −1 −4|] on Gene Ward Smith's page is a mapping: it tells you not just which commas vanish, but which specific generators to use (period and fourth).

Different mappings for the same temperament correspond to different choices of basis for ℤʳ—i.e., different choices of generators. For meantone, you could use (octave, fifth) or (octave, fourth) or (octave, major second); these give different mappings but the same temperament.

So my earlier statement "a temperament is a surjective homomorphism" was too strong. The temperament is really just the kernel; the homomorphism (mapping) is additional structure you impose when you want to work with specific generators.

In short, I caught it in a mistake, and it fessed up.

(The term 'wedgie' sounds silly, but it's an invention of Gene Ward Smith, who used wedge products to describe submodules of ℤʳ.)

David Michael Roberts (Feb 01 2026 at 23:19):

I think that AI doesn't give you knowledge but instead access to the limit of your own reflective capacity

https://mathstodon.xyz/@mathemagical/115993612620433903

David Michael Roberts (Feb 01 2026 at 23:21):

Also, this classic from Ronald Fisher

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.

Thinking that one can ask a professional mathematician as the final step of checking after cycling random ideas through a bunch of LLMs to "check" the results amounts to the above.

David Michael Roberts (Feb 01 2026 at 23:25):

And in case it's not obvious, the error John pointed out in the LLM chat is elementary. He had the correct idea it seems, the LLM bluffed some nonsense at the level of an undergrad student learning about groups for the first time and having not enough of a grasp on the material, and John knew enough to push back, and only at that point was the LLM pushed into a space of text prediction that included actually writing about the error and stating the (original!) correct idea. Now imagine this slip amplified by dozens or even hundreds of non-corrected cycles all building on each other.

Kevin Carlson (Feb 02 2026 at 02:16):

There’s also the difficulty of even getting into a space where you’re getting the LLM to produce statements that have any truth value, false or no, rather than just vibes.

Kevin Carlson (Feb 02 2026 at 02:38):

I’ve also been finding them useful for similar purposes to John, to be sure. A particularly nice use is trying to find the reference for some particular thing I know is true but that’s buried in unfamiliar Australian papers from the ‘80s. That sometimes works well. I think a common thread is that when these devices are useful, they are saying things you can check yourself, and most often something you could produce yourself by a predictable process, given enough time and energy.

David Michael Roberts (Feb 02 2026 at 05:52):

Our current belief is that, as a general principle, mathematics papers should always be written by humans, even when the AI-generated content is (fully) mathematically correct. The reason is that authorship entails accountability for both mathematical precision and expositional integrity, such as the correctness of attributions, which is a responsibility that only humans can bear.

https://arxiv.org/abs/2601.22401

David Michael Roberts (Feb 02 2026 at 08:29):

The above was from an expert team using AI to solve problems posed by Erdős, for those that do not click though, so they are very much not anti-AI.

Morgan Rogers (he/him) (Feb 02 2026 at 08:54):

Experts get improved efficiency/time saved in exchange for their expertise (presumably) and greater energy consumption - the system as a whole is far less efficient, albeit faster. Meanwhile, amateurs and autodidacts have their time wasted and students have their cognitive abilities eroded while crumbling public education systems are unable to adapt due to being systematically underfunded. None of it really seems worth the cost to me.

Ben Sprott (Feb 02 2026 at 17:15):

I would like to highlight just the mathematical construction that chatGPT provided with some proof outlulines it offered. Is anyone open to seeing this here in this thread? It is somehing that could be checked easily without having to read a tough physics paper.

David Michael Roberts (Feb 02 2026 at 20:36):

Did you not read the thread? John B is thoroughly trained in physics. I have a physics degree. I'm not sure "tough physics paper" is the sticking point at all, and by insinuating we would have difficulty with LLM physics I personally feel it's mildly insulting.

James Deikun (Feb 02 2026 at 20:38):

No, I think what he's insinuating is we would have a tough time with his own "big" physics ideas with no mathematical content. And TBH, watching someone try to muddle their way through doing what's probably the interpretation of quantum mechanics with poor math skills probably would be pretty tough going. (I found this more than mildly insulting.)

Ben Sprott (Feb 02 2026 at 20:50):

No one wants to read my physics papers. But I can give a pure mathematical construction that I have "found" and we can talk about it. If you keep hearing insults, maybe the forum is toxic. I don't know. Nothing was insinuated.

If you actually want to talk about my program in physics, be explicit. I will engage. That is not what I have seen on math forums, ever.

In any case, this thread raises an interesting, honest, process that fits with the coming of the new technology. I appreciate everyone's input.

David Michael Roberts (Feb 02 2026 at 21:16):

Ben if you straight up give me $1000 I will read and honestly critique chatGPT's output for you.

Ben Sprott (Feb 02 2026 at 21:19):

I would love that! :smile:. Give me some time, I'm still working on getting this startup going.

Ben Sprott (Feb 02 2026 at 21:22):

Meanwhile, I will add all the definitions in the way John B was saying, needing to be rooted down into something that can be easily checked on Wikipedia.

David Michael Roberts (Feb 02 2026 at 21:28):

No, not Wikipedia, an actual mathematics textbook.

Ben Sprott (Feb 02 2026 at 21:35):

Sure, that would be fine

David Michael Roberts (Feb 02 2026 at 21:38):

Because Wikipedia sometimes gives a vague waffly "definition" that people try to use, and which is not precise enough to even be wrong.

By the way, the $1000 gets one lot of feedback, and scales upwards with each consult.

Ben Sprott (Feb 02 2026 at 21:38):

Let's put it all on here so the process can be viewed by everyone. It could be improved and the process itself can be critiqued.

David Michael Roberts (Feb 02 2026 at 21:39):

I would write a blog post, I don't think people want to read that stuff here.

David Michael Roberts (Feb 02 2026 at 21:39):

I'm not interesting in having a reproducible "process".

Ben Sprott (Feb 02 2026 at 21:39):

It's about seeing how the off-brodway physicist can use the chatbot (hopefully correctly) and how this can dovetail with actual human checking.

Ben Sprott (Feb 02 2026 at 21:40):

David Michael Roberts said:

I'm not interesting in having a reproducible "process".

It's for others to see it all happening. It has to go here.

David Michael Roberts (Feb 02 2026 at 21:41):

My prior is, based on discussion here, the other thread, and my years of experience, that you may well be disappointed, but a thousand dollars buys you one set of honest, unbiased technical feedback, with no additional rounds of comments.

David Michael Roberts (Feb 02 2026 at 21:42):

If you want my actual honest expert opinion that badly, you have to accept my terms. I don't think that discussing the document in detail here is fair to my colleagues.

David Michael Roberts (Feb 02 2026 at 21:42):

It will be public, though.

David Michael Roberts (Feb 02 2026 at 21:44):

However, since I (and others) have already given you free expert advice about mathematics for years now, and a lot of it hasn't been acted on, please be aware that I'm done with anything except a transactional (possibly one-off) exchange.

David Michael Roberts (Feb 02 2026 at 21:49):

I'm also going to be straight in case anyone else is reading this in the future: this offer is not something anyone else can expect. I have better things to do with my life than check LLM output, like my own research, for which I get precious little time to do.

Ben Sprott (Feb 02 2026 at 21:54):

Would anyone else like to offer this as a service?

Ben Sprott (Feb 02 2026 at 21:56):

It would not be a discussion thread. Just posting the analysis and then how I can manage the analysis.

Ben Sprott (Feb 02 2026 at 21:59):

David implied that it would an easy job to interpret the paper. That is not true. This will come with a steep need to understand foundations in quantum theory, QG, QFT.

Ben Sprott (Feb 02 2026 at 21:59):

And disparate advanced work.

David Michael Roberts (Feb 02 2026 at 22:07):

David implied that it would an easy job to interpret the paper.

I said that you implying people here might find the physics "tough" underestimates the physics background of some people here. I'm only offering to critique the mathematics, however, since while I do have a physics degree, I'm not going to charge anyone money for my thoughts there, even if I can understand the physics, since I don't call myself a physicist, and I don't want to open up a can of worms there. I recognise the limits of my expertise and want to steer clear of them.

And I'm not posting any comments here. That's my offer: a detailed blog post critiquing the mathematics, with a link to it from here, or nothing.

David Michael Roberts (Feb 02 2026 at 22:08):

Moreover, I'm happy to permanently rescind this offer if too much arguing happens beforehand.

David Michael Roberts (Feb 02 2026 at 22:11):

I'm also mindful of the advice in this post https://academia.stackexchange.com/a/18570/8881 about how one should interact with an academic assessing big claims.

Understand that she is not required to help you. If she is helping you she is doing so out of generosity. She has a right to stop it whenever she pleases without any explanation. Be mindful of her time, do not waste it for what you could/should have done yourself, try to make her job in helping you as easy as possible, and do not do anything that will make her regret trying to help you.

With the caveat here that if you are paying money, you will get precisely what you pay for.

David Michael Roberts (Feb 02 2026 at 22:13):

And until that money is present and ready to be transferred, I won't be discussing more details or meta-details about this, so please don't reach out until then

Ben Sprott (Feb 02 2026 at 22:32):

Yes, so again, does anybody else want to start this process with me? I need to get money ready, so that will take time. The idea is that we have have a poorly trained category theorist and a good physicist, me, using chatgpt to learn about their work. They get some feedback from the LLM and they cannot check it. The feedback is category theory and physics. The professional category theorist, that's you, is paid a fee to read the work and check the construction, the pure math, and if they are also a well trained physicist, they can comment on the physical theory.

The analysis is put into a document and pasted here. I try to absorb the analysis and then give a comment or an updated document. No back and forth. Just two blocks of work. You set a reasonable price. I will want to take all the analysis and put it wherever I want, math overflow, a web page, a full document that is posted to research gate etc.

.....

Well, look, think about it everyone. This is going to become an increasing part of research and I am happy being the guinea pig. I think it would be fascinating for anyone to try this with me. Plus hard cash for you ! 16306185449969755609.png

David Michael Roberts (Feb 02 2026 at 23:05):

I will want to take all the analysis and put it wherever I want, math overflow, a web page, a full document that is posted to research gate etc.

No. I will not give permission for you to re-post any analysis I might do on your own outlets, it would be solely for your own private benefit.

David Michael Roberts (Feb 02 2026 at 23:10):

You are essentially asking for copyright in the text of the report, and I will be only too happy to charge extra accordingly.

David Michael Roberts (Feb 02 2026 at 23:11):

That you think this is all worth a winking and tongue poking out emoji tells me you are not taking this sufficiently seriously.

Josselin Poiret (Feb 03 2026 at 09:04):

new job prospect for category theorists: siphon VC money by entertaining LLM-fueled crackpots' ideas.

although i guess some people here have already been doing that for some years :thinking:

fosco (Feb 03 2026 at 10:03):

@Ben Sprott what kind of feedback do you want, exactly? The first definition you write doesn't mean anything...