Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: philosophy

Topic: About cashiers, baristas and mathematicians


view this post on Zulip Jean-Baptiste Vienney (Nov 02 2023 at 12:53):

John Baez [said]:

Yes, twenty years from now kids will be shocked that you never got the brain implants that do proofs for you and automatically upload your theorems to the web.

Jean-Baptiste Vienney [said]:

I'm not sure that it will happen. People could be replaced by machines in many jobs but it doesn't happen e.g. I don't know why do we still have cashiers?

Personally, this is mostly from a theoretical point of view that I want math and computer science to be closer (out of subject ahah).

Damiano Mazza [said]:

It's ok, I deleted it, I'm not sure under which topic it would fit anyway :big_smile: (For those who haven't seen my comment, it was a reply to Jean-Baptiste Vienney's remark on why cashiers still exist, followed by a digression, nothing offending anyone by tremendously off-topic!).

Jean-Baptiste Vienney [said]:

I didn’t see it, but we could start a new topic to discuss this!

view this post on Zulip Jean-Baptiste Vienney (Nov 02 2023 at 12:54):

What did you want to say @Damiano Mazza ?

view this post on Zulip John Baez (Nov 02 2023 at 12:59):

I wanted to say that in the UK, cashiers are being replaced by machines. In many of the grocery stores I shop at - at least the ones in big chains like Tesco - there are very few cashiers and lots of automatic checkout machines. I use the cashiers when possible, mainly because I want people to be able to get jobs as cashiers if they want. The cashiers can solve problems the machines can't, but I suspect one reason for keeping cashiers is that old people like me prefer working with a person.

view this post on Zulip Jean-Baptiste Vienney (Nov 02 2023 at 15:01):

When I think about it, it's also true in Canada. Especially in low-cost big chains like Dollorama and Loblaws. There is also an example where there are only machines, it's Uniqlo. It's a clothes store, and a Japanese chain, and there are only machines. Moreover they work better than the ones in Dollarama or Loblaws: you don't have to scan any barcode, it automatically detects the product by some magic when you put it in the dedicated place.

view this post on Zulip Damiano Mazza (Nov 02 2023 at 15:01):

In my original comment, I started by saying basically what John said above: cashiers are being replaced by machines (at least in Europe, and in particular in France, where I live). Human cashiers are kept probably because in some situations it would be impractical for customers to scan every single item they purchase. But there are stores that switched to full automation: the sporting goods store Decathlon has these crazy automatic cash registers in which you do not need to manually scan anything, you just randomly throw the items one by one in a basket. Human staff intervenes only if there's a problem (which has never happened to me so far, amazingly).

Then I went on saying that, more generally, in the current socio-economical model, the large-scale development and deployment of technological advances is, with few exceptions, driven by profit, not by a desire to actually increase the well-being of people. In these conditions, whenever a human costs less than a machine, the industry will go for the human. I don't think that the main reason why we got large language models passing the Turing test before we got coal-mining or tomato-picking robots is that developing the latter is inherently harder than the former. It has more to do, I think, with the fact that a slave costs less than a robot.

view this post on Zulip Damiano Mazza (Nov 02 2023 at 15:03):

Jean-Baptiste Vienney said:

[...] you don't have to scan any barcode, it automatically detects the product by some magic when you put it in the dedicated place.

Ah, you see, I was sure Decathlon wasn't the only one with this technology! (Sorry, I was writing my comment while you posted yours).

view this post on Zulip Damiano Mazza (Nov 02 2023 at 15:13):

(By the way, now you see that my comment had nothing to do with mathematicians either, so it manages to be slightly off-topic even here :joy:)

view this post on Zulip Jean-Baptiste Vienney (Nov 02 2023 at 16:24):

Yes ahah, but the big question for us is: Are mathematicians (or computer scientists) going to be replaced too by machines?

view this post on Zulip Morgan Rogers (he/him) (Nov 02 2023 at 17:21):

I certainly hope not and am sceptical of it being a possibility. Insofar as mathematicians are employed to perform calculations, they are potentially replaceable, but I don't think that's what most mathematicians consider themselves to be doing.

view this post on Zulip Damiano Mazza (Nov 02 2023 at 17:56):

I think that for the moment we are safe :big_smile: As Weyl put it, mathematics is at "the meeting point of constraint and freedom that is the very essence of human nature". For the moment, machines are able to exhibit some constrained creativity (for example, writing sonnets), but everyone agrees that these are very far from what is needed to do mathematics.

So here's a more specific rephrasing of the question: given more time, more layers, etc. will it be possible for future evolutions of our current LLMs to "figure out" the expected behavior to mathematical prompts as they have now "figured out" the expected behavior to prompts asking them to write sonnets? Will you you be able, one day, to ask GPT 19 "Please prove the Riemann Hypothesis" and see it spit out a valid proof? Some people, apparently, think that this is a serious possibility. I like to think that it is not, but have no valid reason for it, I guess I am just some kind of romantic :upside_down:

If really pressed to motivate my position, I would say that I think (because I am a romantic!) that our ability to ask questions must, at some level, be essential to our ability to answer them, and for the moment it seems that no one has figured out how to make a machine wonder about the world and ask itself questions...

view this post on Zulip JR (Nov 02 2023 at 18:31):

LLMs can't reason symbolically or predict or truly synthesize anything. They only mush together local sections. If you aren't asking much beyond their corpus or are very lucky, you can get a global section instead of gibberish. A different architecture is needed to get to mathematical reasoning. Maybe LeCun's joint embedding predictive architecture will yield fruit.

view this post on Zulip JR (Nov 02 2023 at 18:31):

Here I mean "predict" in a sense other than "predict the next word" of course.

view this post on Zulip Damiano Mazza (Nov 02 2023 at 20:15):

I am not an expert at all, neither theoretically nor practically (I have never interacted with any LLM and do not plan on needing to do it any time soon), but I am not sure I understand why it is relevant whether LLMs are truly "predicting" or "synthetizing" anything, maybe because I do not understand the technical meaning of these words.

An example: last year, I sent an email to my students with practical info about the final exam. The tone of the email was rather light and it contained a few jokes here and there, which encouraged one of the students (who, curiously, thought that I had used ChatGPT to produce the email!) fed it to ChatGPT and asked it to write a response in sonnet form, from the perspective of a student. The result was quite impressive. Beside the fact that it was a well-formed sonnet, its content was perfectly consistent with my email, as if it had "understood" it.

If the student had written that sonnet, would you consider that they had actually synthesized something? Or predicted something? How would they have gotten the "global section"? How did ChatGPT get it? (I don't think it was by luck, and the "corpus" on which it was trained certainly did not contain my email). To me, what matters is that the reply looked like something written by a human being with a thorough understanding of English grammar and of the rules underlying the structure of a sonnet (meter, rhyme, length of stanzas, etc.), and of course who had read and understood my email, including the irony of it.

That was ChatGPT 3. How can we rule out, formally, that ChatGPT 28 (I keep appending "high" random numbers to it :big_smile:) will not act like it "understands" the rules of logic, of mathematical reasoning, etc. and be able to spout text perfectly adhering to them, consistently with the question being asked?

view this post on Zulip Damiano Mazza (Nov 02 2023 at 20:16):

By the way, your allusion to local vs. global sections is intriguing! Is there really a sheaf-theoretic understanding of LLMs? (Or neural networks in general?).

view this post on Zulip Jean-Baptiste Vienney (Nov 02 2023 at 20:33):

Damiano Mazza said:

I think that for the moment we are safe :big_smile: As Weyl put it, mathematics is at "the meeting point of constraint and freedom that is the very essence of human nature". For the moment, machines are able to exhibit some constrained creativity (for example, writing sonnets), but everyone agrees that these are very far from what is needed to do mathematics.

So here's a more specific rephrasing of the question: given more time, more layers, etc. will it be possible for future evolutions of our current LLMs to "figure out" the expected behavior to mathematical prompts as they have now "figured out" the expected behavior to prompts asking them to write sonnets? Will you you be able, one day, to ask GPT 19 "Please prove the Riemann Hypothesis" and see it spit out a valid proof? Some people, apparently, think that this is a serious possibility. I like to think that it is not, but have no valid reason for it, I guess I am just some kind of romantic :upside_down:

If really pressed to motivate my position, I would say that I think (because I am a romantic!) that our ability to ask questions must, at some level, be essential to our ability to answer them, and for the moment it seems that no one has figured out how to make a machine wonder about the world and ask itself questions...

Wether we discover that machines can ask questions or not, the answer will be interesting. We're going to learn if there is something really specific to human intelligence. Is there a principle of pure creation that only human beings are able to use or not?

Maybe it will be something that goes against determinism. I like to think that if the whole world is governed by deterministic principles, then we are not really free of our acts because our mental states are determined by the ones one second before, our whole environment and the laws of physics. In this case, we think that we make choices, but this is at the end an illusion. We are living the process of making this choice, but it is not really determined by us. If there is a creative principle which is not something deterministic, it would be something that we don't understand for the moment.

On the other hand, I also like trying to understand humans as artificial intelligences which is more original. Sometimes people are acting in a way that seems impossible to understand to us. For instance, they use a mailing list to communicate instead of speaking in a nice adapted online environment like this Zulip. Why are they doing this? Because they have learned to communicate on the mailing list and even if it's a quite obsolete system, they continue because their brains are programmed to think that it is good.

Sadly, this vision of humans as artificial intelligence seems closer to the reality that an undeterministic creative principle specific to humans but I don't know. We'll discover later if we are just very complicated machines or not. And remember that machine learning has been invented by taking inspirations on our little neurons :upside_down:

view this post on Zulip Damiano Mazza (Nov 02 2023 at 20:50):

Jean-Baptiste Vienney said:

Is there a principle of pure creation that only human beings are able to use or not?

I don't know about that and don't know about the connections with non-determinism, but what I meant to say is that our intelligence seems to have an existence outside of responding in the "right way" to outside stimuli (like, for chat bots, giving answers to prompts). We somehow "respond to ourselves" too, typically, but not necessarily, when we seek the answer to a question we ourselves asked. This may be a key feature defining intelligence itself, and it is irrelevant, a priori, whether the underlying mechanism is deterministic or not.

What does an LLM do when it's not responding to a prompt? How does it spend its time when it is "all alone by itself"? Currently, these questions do not really make sense. As long as they won't make sense, we will not have artificial general intelligence, I think. Not in the sense that I will not consider it as such, but in the sense that it will lack a fundamental feature allowing it to "grow" beyond predicting answers to prompts (in particular, doing mathematics!). This is of course just an opinion, I have no idea how to justify it even semi-formally.

view this post on Zulip JR (Nov 02 2023 at 21:03):

Damiano Mazza said:

By the way, your allusion to local vs. global sections is intriguing! Is there really a sheaf-theoretic understanding of LLMs? (Or neural networks in general?).

LLMs iteratively predict the next word in a sequence of words that starts with a prompt. The prompt is thus a local section that gets extended by the LLM. (So OK, we are dealing with a cosheaf if you don't feel like taking an opposite.) By construction, each local section of N-1 words followed by the predicted word is reasonably coherent. But for any hard prompt that it really tries to answer, the local sections will decohere because the LLM cannot actually reason about its input or output. I.e., the sequence of words can be coherently extended locally, but not globally: while the syntax of the output is globally fine, the semantics is not, because the LLM doesn't actually know anything about that and they forget whatever's outside their window. Poetry doesn't require global reasoning, or any reasoning at all. The LLM just extends snippets based on other snippets it's seen.

view this post on Zulip Damiano Mazza (Nov 02 2023 at 21:42):

Ok, thanks, I see what you mean! But then it seems like all that is needed to extend the "window" of locality. For example, it is not true that "[p]oetry does not require global reasoning": for example, the rhyme structure is, to a certain extent, global (it skips lines), also you need to remember which line of which stanza you are writing, because they are of different length, etc. Of course, this means that ChatGPT 3's "window" was big enough to cover all this knowledge. In other words, a sonnet fits within a "local section". If you're window is the length, say, of a whole book, then who knows what you can do... Maybe that's what LeCun's new approach allows us to do?

By the way, proofs only need be locally consistent, in the sense that the validity of a proof may be checked by looking at each single inference rule. There is no "global reasoning" in formal proofs, all you need is that the proof tree starts with what you want to prove and ends on axioms... In a sense, proofs are easier than sonnets :big_smile: Of course, what's difficult is to find out which rule to apply next. But that applies to writing the next line of a sonnet too... I don't know, I'm just amazed at what LLMs are able to do, but that's probably because I am not an expert. Many experts I meet seem to be very dismissive (but then again I never met any expert working at OpenAI or anything like that...).

view this post on Zulip Matteo Capucci (he/him) (Nov 03 2023 at 08:47):

Jean-Baptiste Vienney said:

Yes ahah, but the big question for us is: Are mathematicians (or computer scientists) going to be replaced too by machines?

I believe a barista today and a barista 100 years in the future will do more or less the same job, while a mathematician won't.
What I see happening is AI tech augmenting and empowering our job way before it can completely replace us. Potentially, the empowerment will be so great to offset replacement considerably. Actually, I don't think such replacement will ever occur!

The reason is that 'mathematics' means two subtly different things. The first meaning is the shallower one which we default to (Wikipedia's definition: "Mathematics is the study of representing and reasoning about abstract objects"). This is true, but leaves out a big part of mathematics: it is not enough to sit in a room and churn out proofs, mathematics is inherently about the collective understanding of mathematical objects. It's not some form of meditation in which your internal experience is the only thing that matters. Mathematics is fundamentally about studying abstract objects together; and advancement of maths always means a collective advancement, not an individual one.

So as long as humans will be relevant and interested in the world around them, they will want to study maths. And this entails having a communal effort to distill, transmit and elaborate mathematical knowledge. That's why I believe future humans will be heavily assisted by machine reasoning, but they certainly won't stop doing maths.

view this post on Zulip Alexis Toumi (Nov 03 2023 at 10:44):

It's a funny choice to take baristas as examples while their job did not exist 100 years ago and they are getting replaced by machines much faster than mathematicians (a quick search led to e.g. this article on how AI is used in the coffee industry)

view this post on Zulip Alexis Toumi (Nov 03 2023 at 10:46):

I really like this idea of mathematics as a collective process though! and from that perspective the work of a mathematician hasn't changed much since Euclid, just that now we can use computers to communicate via emails Zulip and to write proofs with code

view this post on Zulip Alexis Toumi (Nov 03 2023 at 10:55):

My bad according to wikipedia the word "barista" has been used for at least 107 years by now!

Prieto (2021) shows that the word barista has been documented since 1916 in both Spanish and Italian.

But espresso has a super interesting history of being developed along with the industrial revolution: with the steam engines of our trains we go faster than ever, so now we gotta use the same steam to brew our coffee faster than ever.
image.png

view this post on Zulip John Baez (Nov 03 2023 at 11:03):

Coffee has been part of western civilization for much less time than mathematics, but I bet it's increased the rate of mathematical progress! According to Wikipedia, Vienna played a special role:

it is now widely accepted that the first Viennese coffeehouse was actually opened by an Armenian merchant named Johannes Diodato (Asdvadzadur) in 1685. Fifteen years later, four other Armenians owned coffeehouses. The culture of drinking coffee was itself widespread in the country in the second half of the 18th century.

Over time, a special coffee house culture developed in Habsburg Vienna. On the one hand, writers, artists, musicians, intellectuals, bon vivants and their financiers met in the coffee house, and on the other hand, new coffee varieties were always served. In the coffee house, people played cards or chess, worked, read, thought, composed, discussed, argued, observed and just chatted. A lot of information was also obtained in the coffee house, because local and foreign newspapers were freely available to all guests. This form of coffee house culture spread throughout the Habsburg Empire in the 19th century.

Scientific theories, political plans but also artistic projects were worked out and discussed in Viennese coffee houses all over Central Europe. James Joyce even enjoyed his coffee in a Viennese coffee house on the Adriatic in Trieste, then and now the main port for coffee and coffee processing in Italy and Central Europe. From there, the Viennese Kapuziner coffee developed into today's world-famous cappuccino. This special multicultural atmosphere of the Habsburg coffee houses was largely destroyed by the later National Socialism and Communism and can only be found today in a few places that have long been in the slipstream of history, such as Vienna or Trieste

view this post on Zulip Morgan Rogers (he/him) (Nov 03 2023 at 11:04):

Matteo Capucci (he/him) said:

mathematics is inherently about the collective understanding of mathematical objects. It's not some form of meditation in which your internal experience is the only thing that matters.

So as long as humans will be relevant and interested in the world around them, they will want to study maths. And this entails having a communal effort to distill, transmit and elaborate mathematical knowledge. That's why I believe future humans will be heavily assisted by machine reasoning, but they certainly won't stop doing maths.

I am so on board with this. Mathematics without human involvement doesn't merit its name.
On the other hand, at the same time as proof bots, people are working on pedagogical AI, so it's possible that we could hit a point where we create a mathematics oracle that can genuinely answer our questions in a form we will understand...

view this post on Zulip John Baez (Nov 03 2023 at 11:08):

Mathematics without human involvement doesn't merit its name.

If machines start doing interesting enough things, I will be happy to call them human, or whatever they want to be called. I'll also be happy to let them do mathematics. They might be a lot better than us.

I think space colonization with current-day humans is an idiotic idea, since we need air and food, we only thrive in a narrow range of temperatures, etc. - one science fiction writer called it "sending canned primates to Mars". But if machines become intelligent enough, or we create cyborgs who can survive vacuum, there's a chance that intelligent life - including mathematics - can spread across the universe.

view this post on Zulip John Baez (Nov 03 2023 at 11:10):

However, right now I'm more concerned that the climate crisis could put an end to our civilization.

view this post on Zulip JR (Nov 03 2023 at 12:30):

John Baez said:

Coffee has been part of western civilization for much less time than mathematics, but I bet it's increased the rate of mathematical progress! According to Wikipedia, Vienna played a special role:

it is now widely accepted that the first Viennese coffeehouse was actually opened by an Armenian merchant named Johannes Diodato (Asdvadzadur) in 1685. Fifteen years later, four other Armenians owned coffeehouses. The culture of drinking coffee was itself widespread in the country in the second half of the 18th century.

The story goes back a little bit further:

The history of Viennese coffee house culture is closely linked to the end of the Siege of Vienna in 1683. Legend has it that the Viennese citizen Georg Franz Kolschitzky (1640 - 1694) was the first to obtain a licence to serve coffee in the city following his heroic actions during the Siege of Vienna. The coffee beans left behind by the Turks were the basis of his success.

https://www.wien.gv.at/english/culture-history/viennese-coffee-culture.html

view this post on Zulip Matteo Capucci (he/him) (Nov 03 2023 at 13:31):

Alexis Toumi said:

It's a funny choice to take baristas as examples while their job did not exist 100 years ago and they are getting replaced by machines much faster than mathematicians (a quick search led to e.g. this article on how AI is used in the coffee industry)

Hehe point taken :laughing: let me just say that, as an Italian, I don't see our baristas going anywhere anytime soon (though I'm prepared to be wrong). But I think the same applies to other countries which have their fair share of 'local' cafès, as opposite to the soon-to-be-automated franchises.

view this post on Zulip Matteo Capucci (he/him) (Nov 03 2023 at 13:33):

Morgan Rogers (he/him) said:

[I]t's possible that we could hit a point where we create a mathematics oracle that can genuinely answer our questions in a form we will understand...

Hilbert would be, at last, vindicated!

view this post on Zulip John Baez (Nov 03 2023 at 15:23):

By the way, I think discovering new mathematics is much easier and in some sense less interesting than figuring out how to get lots of people to understand the mathematics that's already known. Of course nobody can understand all the math that's known, but there's room for a lot of improvement. It's possible to explain math much more clearly than people usually do! And it's possible to organize one's understanding to make it possible to understand at least the basics of many different subjects in math. Category theory can help here... but only if it's explained well enough.

view this post on Zulip John Baez (Nov 03 2023 at 15:24):

Computers might also help.

view this post on Zulip Jean-Baptiste Vienney (Nov 03 2023 at 15:35):

If we want some pieces of mathematics to be understood by lots of people, I think we should do it in a very fun and popular way. Making great YouTube videos for instance would maybe be more efficient that writing books.

view this post on Zulip John Baez (Nov 03 2023 at 15:39):

I invented the idea of a blog to explain research-level math to lots of mathematicians. But now that's a bit outdated. YouTube videos would work better - if they were done well. I've considered making them. But I decided it would take too much time to do a good job. Some younger mathematician should do it: someone who wants to explain huge amounts of math in an organized way. If they did it well they could become famous.

view this post on Zulip David Egolf (Nov 03 2023 at 16:40):

For myself, I prefer written explanation to video-based explanation, if I'm trying to really understand something in detail. I think part of the reason is that it's easier to jump back and forth and to look at multiple sections at once. I also find it easier to take in something at my own pace when it's in written form. (That being said, videos can be great fun and quite informative as well! Probably videos are often better for purposes of helping people get interested and excited about something.)

On the topic of computers and explaining math - especially looking to the future - I wonder if someday we'll have the capability to "fine tune" a large language model (or some AI system successor) to a specific conversation or explanation. The author types our their explanation to set both the style and the mathematical content. Then the resulting AI-based explanation would be in the form of an interactive conversation with the AI system that (hopefully) closely follows the style and content specified by the author. This would require a level of precision in conversation from the AI system that we don't have yet.... but maybe someday we will!

view this post on Zulip effy (Nov 03 2023 at 20:49):

John Baez said:

I invented the idea of a blog to explain research-level math to lots of mathematicians. But now that's a bit outdated. YouTube videos would work better - if they were done well. I've considered making them. But I decided it would take too much time to do a good job. Some younger mathematician should do it: someone who wants to explain huge amounts of math in an organized way. If they did it well they could become famous.

I would watch a Baez YouTube channel 100%. There are a lot of math/science YT videos that have high production value but not very interesting content bc they are not done by working mathematicians. Consider me subscriber #1 if you start it.

view this post on Zulip John Baez (Nov 03 2023 at 21:46):

Thanks! So I'll have at least one subscriber. :upside_down:

view this post on Zulip John Baez (Nov 03 2023 at 21:48):

I'd considered spending more and more time on math exposition, since I enjoy it so much... but life has recently pulled me in a more urgent direction, as I'll explain soon.

view this post on Zulip fosco (Nov 04 2023 at 10:15):

mathematics is inherently about the collective understanding of mathematical objects. It's not some form of meditation in which your internal experience is the only thing that matters.

I disagree, mathematics is both: why do you deny half of what every religion is made of?

view this post on Zulip Matteo Capucci (he/him) (Nov 04 2023 at 12:07):

fosco, we're not disagreeing. The key word is only.

view this post on Zulip Patrick Nicodemus (Nov 04 2023 at 20:24):

@JR I am against LLM hype in general and the gushing of AI grifters and hucksters, I think there is widespread disinformation about this topic and i'm generally in favor of warning of the limitations of these models.

That being said I think that this Zulip is a pretty good community and people here are pretty smart and there is some more room for nuance. For this reason unless we start getting really aggressive AI people who are spreading disinformation and gushing about human level AI in the next five years replacing all of us by bots, I think that it is better not to make statements like "LLM's are fundamentally incapable of XYZ because of their architecture." It is certainly true that deep neural networks do not have clear ways to store symbolic knowledge and apply symbolic derivation rules reliably in the way a theorem prover or SQL database does. However, neither does the human brain.
As unfortunate as it is, humans also make mistakes, typos, mis-speak, talk nonsense, and some of us suffer from schizophrenia, hallucinations, delusions, and so on. We also, regrettably, sometimes talk complete bullshit confidently. Nevertheless we have "achieved so much - the wheel, New York, wars and so on" (Hitchhiker's Guide to the Galaxy quote) and so in general I would expect LLM's to be extremely useful if we can reduce the hallucinations and confident bullshit by an order of magnitude or so. In my view the most serious danger from AI right now (and of course it is already ongoing) is a collective business/managerial decision that the increase in "productivity" from using ChatGPT is worth the hallucinations and bullshit in the final document and so we see a general reduction in quality of consumer programs, Stack Overflow answers and so on as unvetted ChatGPT material makes its way into source code and other materials.

But as for mathematics, if we can hook them up to theorem provers which do verify the correctness of proofs then the unreliability will not be an issue. The technology is not at the point where they can discover correct proofs reliably but people are already writing papers on closed loop systems where the neural network either steers the guidance of the proof search by heuristically choosing the most likely next step in the proof or generates the entire proof. Then the error messages from Coq get fed back into ChatGPT, it obsequiously responds "You're right! How clumsy of me. This next proof will _surely_ be correct:" and so on in a loop. You would hope that after a few hundred iterations of this it eventually stumbles across a correct proof, if the theorem is simple enough, and I have no problems with that kind of application.

To return to your original post, talking about "fundamental limits of iteratively predicting the next word" is fine but it is best to write down a clear formulation of the problem and rigorously work out what kind of problems can and cannot be reduced to next token prediction, rather than just saying "of course it's glorified autocomplete, how can we solve real substantial problems with it?"
I think this paper is a good-faith effort to formulate the next-token prediction problem computationally and link it to other problems in computational complexity theory, which helps us in getting a good sense of what it can and cannot do.
https://arxiv.org/pdf/2309.06979.pdf

view this post on Zulip Patrick Nicodemus (Nov 04 2023 at 20:33):

To oversimplify the problem a bit, if your transformer has a window of K tokens, drawn from a finite alphabet Sigma of cardinality N, then whatever prediction algorithm the transformer learns will be some function from |Sigma|^K -> Sigma, which we can think of as the transition function of a finite state machine. Of course you could object here at treating a continuous system as a discrete one, but we communicate right now by transmitting packets of discrete bits over continuous sinusoidal representations of voltage levels (or light intensities if you have fiberoptic internet) so this is not a fatal objection, only adding additional complexity.
So the question is more like: what kinds of functions can be approximated by a finite state machine |Sigma|^N -> Sigma? And I'm sure it's quite a lot of very interesting functions when Sigma and N are large, so I wouldn't rule out useful applications based on the form of the problem.

view this post on Zulip Damiano Mazza (Nov 06 2023 at 09:04):

Patrick Nicodemus said:

But as for mathematics, if we can hook them up to theorem provers which do verify the correctness of proofs then the unreliability will not be an issue.

This is very interesting! You mention that there are papers on this, could you please give some pointers if you have them?

view this post on Zulip Spencer Breiner (Nov 06 2023 at 11:39):

John Baez said:

I think space colonization with current-day humans is an idiotic idea, since we need air and food, we only thrive in a narrow range of temperatures, etc. - one science fiction writer called it "sending canned primates to Mars".

Plus spreading our dirty germs to Mars or Europa might wipe out our best chances to discover life outside of earth

view this post on Zulip Patrick Nicodemus (Nov 06 2023 at 12:59):

Damiano Mazza said:

Patrick Nicodemus said:

But as for mathematics, if we can hook them up to theorem provers which do verify the correctness of proofs then the unreliability will not be an issue.

This is very interesting! You mention that there are papers on this, could you please give some pointers if you have them?

This paper has large language models generate entire Isabelle proof scripts.
https://arxiv.org/abs/2210.12283
In my opinion this method would be less successful for Coq or Lean because Isabelle's proof language is more declarative and all the reasoning is explicit. Language models do better when they are allowed to "think out loud" (chain of thought reasoning) as it makes next token prediction easier. Coq and Lean have implicit proof state that is mutated by imperative tactic commands.

The idea of this paper is to use a language model to generate partial proofs with holes, where the holes are small enough to be dispatched to a powerful brute-search automated theorem prover for first order logic.
https://arxiv.org/pdf/2303.04910.pdf

Fundamentally automated theorem proving is basically a tree search problem, where each node is a step in the proof and the child nodes represent the next possible step in the deduction. Some steps of reasoning may be logically valid but take you in the wrong direction :
"We want to prove n=m. Because the successor function is injective it suffices to prove n+1=m+1. It suffices to prove n+2=m=2. It suffices to prove n+3=m+3..." and so on in an infinite loop. So what is needed are good search heuristics to guide the search toward the most promising next steps in the argument.
There are many papers on this, for example
https://arxiv.org/pdf/2205.11491.pdf

Another good use of AI here is to learn to
select the premises fed to the automated theorem provers. You want to give it all the lemmas it needs to solve the problem without introducing too many irrelevant lemmas. https://arxiv.org/abs/2205.01981

view this post on Zulip JR (Nov 06 2023 at 13:02):

Patrick Nicodemus said:

JR
To return to your original post, talking about "fundamental limits of iteratively predicting the next word" is fine but it is best to write down a clear formulation of the problem and rigorously work out what kind of problems can and cannot be reduced to next token prediction, rather than just saying "of course it's glorified autocomplete, how can we solve real substantial problems with it?"
I think this paper is a good-faith effort to formulate the next-token prediction problem computationally and link it to other problems in computational complexity theory, which helps us in getting a good sense of what it can and cannot do.
https://arxiv.org/pdf/2309.06979.pdf

Maybe https://arxiv.org/abs/2310.07570 is worth a look as well. But the key in my view is that new things are unlikely to be figured out though known things can be worked with more easily. The spaces of facts and proofs are big and the autoregressive heuristic seems utterly inadequate to navigating the former meaningfully, though it may succeed in the latter.

view this post on Zulip JR (Jan 17 2024 at 18:49):

JR said:

Patrick Nicodemus said:

JR
To return to your original post, talking about "fundamental limits of iteratively predicting the next word" is fine but it is best to write down a clear formulation of the problem and rigorously work out what kind of problems can and cannot be reduced to next token prediction, rather than just saying "of course it's glorified autocomplete, how can we solve real substantial problems with it?"
I think this paper is a good-faith effort to formulate the next-token prediction problem computationally and link it to other problems in computational complexity theory, which helps us in getting a good sense of what it can and cannot do.
https://arxiv.org/pdf/2309.06979.pdf

Maybe https://arxiv.org/abs/2310.07570 is worth a look as well. But the key in my view is that new things are unlikely to be figured out though known things can be worked with more easily. The spaces of facts and proofs are big and the autoregressive heuristic seems utterly inadequate to navigating the former meaningfully, though it may succeed in the latter.

https://doi.org/10.1038/s41586-023-06747-5

In which synthesizing artificial theorems en route to "monkey see, monkey do" is the key enabler. Not clear this sort of thing can ever work without a very clean domain-specific representation and symbolic reasoning engine already at hand, neither of which would seem to be present for "interesting" things.

view this post on Zulip Madeleine Birchfield (Jan 17 2024 at 23:11):

A few days ago DeepMind came out with a new model called AlphaGeometry, that is able to solve geometry problems at the International Mathematical Olympiad:

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/

What makes this different from previous models is that DeepMind paired a LLM with a symbolic deduction system, where the LLM puts out possible suggestions for solving the problem and the deduction system checks to see if the LLM's suggestions work or not.

view this post on Zulip JR (Jan 18 2024 at 00:58):

This sounds familiar.