You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Recently there have been a spate of papers applying categorical gluing methods to prove normalization theorems. I guess this idea goes back a ways (e.g. to Fiore 2002 if not further), but recently there is Sterling-Spitters, Sterling-Angiuli, and Gratzer.
Supposing (for the moment) that I understand what's going on in these categorical "normalization by gluing" arguments, how are these mathematical normalization results related to the normalization algorithms implemented in proof assistants? Most of these papers include some remarks about the connection of "normalization by gluing" to algorithms called "normalization by evaluation", but I am having trouble making this anything more than a vague analogy.
To be sure, both involve an intermediate step consisting of "values" in the metalanguage. But in NbG, these "values" are a family of set-valued presheaves on the category of renamings, indexed by the types of the theory and defined recursively over those types by applying the universal property of the syntactic category (in a dependent type theory, the recursion is mutual with the action on morphisms of a functor defined by this universal property). While in a bare-bones description of NbE as, for instance, on wikipedia, these "values" are a single "recursive datatype" -- a sort of "inductively defined" object that's allowed to appear non-strictly-positively in its own constructors.
So there are at least two big differences. First, in NbG there are a whole bunch of "types of values", indexed by pairs of types and contexts in the theory: one presheaf for each type, and each presheaf consists of one set for each context. While in NbE (at least, in the naive presentation on wikipedia) there is only one type of values. Second, in NbG these types of values are defined recursively over the types of the theory, while in NbE the type of values is "inductively" defined -- with a sort of "induction" that, I believe, doesn't even exist in categories of sets or set-valued presheaves (you have to use domains).
Is there any way of looking at an NbG proof and directly extracting an algorithm that can be explained naively and coded in a simply-typed programming language? Or, conversely, of looking at a naive NbE algorithm and translating it into an NbG proof?
Probably worth noting that, in the Wikipedia article, the NbE is "untyped". In Coq, the semantic interpretation of a term will be by induction over its type derivation: https://github.com/huitseeker/nbe/blob/2187fa936e890e475adf9c178dbf0487c655166f/nbe.v#L99
No Lam : (A -> A) -> A
required.
I am not sure what you mean by "in Coq". In a simply typed programming language it is not possible to define a family of types by recursion.
I guess there's a separate question of to what extent one can practically implement an NbG proof in a constructive dependent type theory and then "execute" it. Is that more like what you're talking about? One could then try to apply some transformations to make the result into a more explicit simply typed algorithm. Has that been done anywhere?
That is indeed what I am talking about. You probably do need dependent types to carry out this proof. Indeed I conjecture that for a given simply typed term, you can carry out the NbE within MLTT without universes, though not _uniformly_ (since that would imply normalization)
There's probably some sense in which the "NbE witness" of a given well-typed term is the "decoration" of that term in the sense of Bernardi and Lasson: https://who.rocq.inria.fr/Marc.Lasson/articles/RealParam/RealParam.pdf
There they introduce a type of type system which is just expressive enough to encode the "parametricity statement" of a term, but I think you can do the same for the "NbE witness" of a term...
This is a bit vague, and I apologize
Mike Shulman said:
Recently there have been a spate of papers applying categorical gluing methods to prove normalization theorems. I guess this idea goes back a ways (e.g. to Fiore 2002 if not further), but recently there is Sterling-Spitters, Sterling-Angiuli, and Gratzer.
Supposing (for the moment) that I understand what's going on in these categorical "normalization by gluing" arguments, how are these mathematical normalization results related to the normalization algorithms implemented in proof assistants? Most of these papers include some remarks about the connection of "normalization by gluing" to algorithms called "normalization by evaluation", but I am having trouble making this anything more than a vague analogy.
To be sure, both involve an intermediate step consisting of "values" in the metalanguage. But in NbG, these "values" are a family of set-valued presheaves on the category of renamings, indexed by the types of the theory and defined recursively over those types by applying the universal property of the syntactic category (in a dependent type theory, the recursion is mutual with the action on morphisms of a functor defined by this universal property). While in a bare-bones description of NbE as, for instance, on wikipedia, these "values" are a single "recursive datatype" -- a sort of "inductively defined" object that's allowed to appear non-strictly-positively in its own constructors.
So there are at least two big differences. First, in NbG there are a whole bunch of "types of values", indexed by pairs of types and contexts in the theory: one presheaf for each type, and each presheaf consists of one set for each context. While in NbE (at least, in the naive presentation on wikipedia) there is only one type of values. Second, in NbG these types of values are defined recursively over the types of the theory, while in NbE the type of values is "inductively" defined -- with a sort of "induction" that, I believe, doesn't even exist in categories of sets or set-valued presheaves (you have to use domains).
Is there any way of looking at an NbG proof and directly extracting an algorithm that can be explained naively and coded in a simply-typed programming language? Or, conversely, of looking at a naive NbE algorithm and translating it into an NbG proof?
Gluing and realizability are two different ways to think about connecting "math" to syntax. In either case, it seems possible to formulate notions like reification and reflection, the main constituents of the nbe/nbg approach. To me that (reify/reflect) is the important idea, probably due to Tait or maybe even going further back(?), and that there are a variety of settings in which one can implement that idea, including realizability-style and gluing-style settings.
I wasn't thinking about realizability at all. Are you saying that "naive NbE" is a sort of realizability?
Yes, I think that naive nbe is a form of realizability (in the very broad sense of realizability as a field that goes beyond the Hyland-etc. tradition) and that if one wanted to do so, it would probably be possible to make the connection explicit.
Hmm, okay, that's something to chew on.
The classic proofs involving nbe are very realizability-style anyway... Which makes sense because they were doing proving normalization over an untyped domain using partial equivalence relations.
So I remember you (Jon) saying that the normalization proof by gluing (of cubical type theory, say) is not exactly the normalization algorithm implemented in a proof assistant. But somehow I had the impression that the connection was fairly close at least intuitively, in that you could look at one and see the outlines of the other. I've been trying to do that in a simple case, but I can't see much connection other than that in both cases there is something called "reify" and something called "reflect". Maybe the impression I got was of more closeness than you intended to convey.
Yes, the close connection is that they share the same main idea: a saturation law for values between neutrals and normals, called reflect/reify. the details of the implementaiton of this idea may differ --- for instance, common implementation techniques like defunctionalization totally change where these things get called, and can disrupt the symmetry that you find in the mathematics...
That is why I am urging people to be cautious about trying to make too close a formal connection. Even something like the particular way that you deal with variables can totally obscure a connection to the math... So when comparing the implementation and the proof, we have to use the right vantage point. In both cases, algorithms and proofs are about ideas.
With that said, in all cases, there is an algorithm for implementing "reflect" on a connective. And this algorithm is faithfully represented in the Fiore-Altenkirch-Kaposi-Coquand-Sterling-Angiuli-blah-blah-blah proofs.
One (the only) benefit of the realizability-style approaches (as exemplified in the habilitationsschrift of Andreas Abel, as well as the paper by Daniel and Lars and myself for a modal type theory) is that you really are proving something about an actual algorithm that is extremely close to the implementation.
To see the unity between all these approaches, I feel that it is necessary to accept a bit of fuzz between them so that the main ideas (reification and reflection, etc.) can be allowed to speak for themselves.
Does it really satisfy you to have a mathematical proof that there exists a normalization function, and an implementation of a "normalization algorithm", but no proof that the algorithm actually implements the function? In particular, can you actually prove that the actual normalization algorithm used in a proof assistant actually terminates?
I would say that as a category theorist I am allergic to fuzz. (-:O
It does satisfy me, because as an implementer what interested me was the main idea --- the important lever that conceptually makes it possible to do normalization. With that said, I do know how to prove that the actual algorithm terminates --- using a realizability approach like in the old days. I don't bother with this because those proofs are absurdly difficult. But I think someone who was more interested in it than me could try and transform those into actual mathematics (like I did for the nbg arguments of Coquand and friends), and this could be a nice result for someone's thesis.
Meaning that whenever I see an analogy I want to make it precise, and category theory is an excellent tool for finding frameworks for doing that.
I think that category theorists use "fuzz" all the time --- it is part of how you decide what will be a conjecture and what won't. But what I like about category theorists is they tend to go further... and make the connection precise.
Right.
All I am trying to say is that you will really suffer if you try to make the connection between the two things as they are precise, but that there is a direction one could persue (which would likely change both things being compared) that could lead to more precision.
To me this isn't one of my research questions, but I think it's a very good thing to work on. Someone with more of a stomach for realizability than me should take it on :)
Well, phooey.
I hoped this was something someone already understood and could explain to me.
(-:
Ahh... It would be great if someone does understand it, but what I've said is the limit of my understanding. Hopefully someone else will chime in and save us!
Mike Shulman said:
I hoped this was something someone already understood and could explain to me.
I had also always assumed this was possible, but not spelled out anywhere. The truth seems quite disappointing!
It's one of those things that constructivists often gloss over --- e.g. the coquand proof is constructive, so it carries an algorithm, but this algorithm is actually not what you would implement. To me the extracted algorithm is still very interesting, because it has the main ideas of nbe inside it, but like most instances of the "extraction dream", it's really just an illusion.
Does it not seem a bit disingenous to call both NbE if they're not actually the same in a precise sense?
NbE for its entire history has meant a constellation of techniques that are all very different from each other, but share some of several key features. (It is a "family resemblance")
Maybe I would be more satisfied if I felt like I understood the "main idea" of reify/reflect that you refer to. This "yoga" frequently seems to be introduced with little explanation of where it comes from or why it's natural. I feel like I understand proofs of canonicity by gluing: the scone is an obviously natural categorical thing, and from there the proof more or less writes itself. But normalization by gluing feels to me more like a categorical argument cooked up to match an algorithm that someone already had in mind, and yet I can't find an explanation of that algorithm itself that actually motivates it other than by "this works".
Jonathan Sterling said:
NbE for its entire history has meant a constellation of techniques that are all very different from each other, but share some of several key features. (It is a "family resemblance")
Right, I see.
Jonathan Sterling said:
NbE for its entire history has meant a constellation of techniques that are all very different from each other, but share some of several key features. (It is a "family resemblance")
maybe we should retire the name ;-)
Mike Shulman said:
Maybe I would be more satisfied if I felt like I understood the "main idea" of reify/reflect that you refer to. This "yoga" frequently seems to be introduced with little explanation of where it comes from or why it's natural. I feel like I understand proofs of canonicity by gluing: the scone is an obviously natural categorical thing, and from there the proof more or less writes itself. But normalization by gluing feels to me more like a categorical argument cooked up to match an algorithm that someone already had in mind, and yet I can't find an explanation of that algorithm itself that actually motivates it other than by "this works".
Yes, this is basically where I am at too. I think we don't yet know why this wondrous idea works. I expect that when we find a more mathematically forced normalization argument, it may not even involve this "yoga"! Or if it does, it will do so in a very changed way.
Can we at least say anything about how people came up with it in the first place?
Jonathan Sterling said:
It's one of those things that constructivists often gloss over --- e.g. the coquand proof is constructive, so it carries an algorithm, but this algorithm is actually not what you would implement.
Isn't there a bit of a circularity there too? In order to extract an algorithm from a proof in a constructive metatheory, don't you need a normalization algorithm for the metatheory?
Mike Shulman said:
Jonathan Sterling said:
It's one of those things that constructivists often gloss over --- e.g. the coquand proof is constructive, so it carries an algorithm, but this algorithm is actually not what you would implement.
Isn't there a bit of a circularity there too? In order to extract an algorithm from a proof in a constructive metatheory, don't you need a normalization algorithm for the metatheory?
Absolutely not --- we only need to compute closed terms of the metatheory. Another approach is to have realizability as your metatheory. Either way would work.
So you need a canonicity algorithm for the metatheory?
Mike Shulman said:
Can we at least say anything about how people came up with it in the first place?
Yes. It really is possible to see why it is "forced" in an algorithmic sense if you climb the mountain of beta-short/eta-long normal forms. To me, the reify-reflect yoga and the formulation of eta-long normal forms in terms of neutrals and normals are two forms of the same idea, so what one should try to overcome is not one vs the other, but both!
Mike Shulman said:
So you need a canonicity algorithm for the metatheory?
Well, as I pointed out, an alternative is to use a realizability topos as your metalanguage. But basically yes.
Jonathan Sterling said:
Mike Shulman said:
Can we at least say anything about how people came up with it in the first place?
Yes. It really is possible to see why it is "forced" in an algorithmic sense if you climb the mountain of beta-short/eta-long normal forms. To me, the reify-reflect yoga and the formulation of eta-long normal forms in terms of neutrals and normals are two forms of the same idea, so what one should try to overcome is not one vs the other, but both!
I am short on time tonight, but if you want to get sent down the rabbit hole --- first just try to close every connective under an operation that sends "elements" to normal forms, assuming that you have such an operation for all the constituent types. You will quickly see why you need reflection.
Jonathan Sterling said:
Yes. It really is possible to see why it is "forced" in an algorithmic sense if you climb the mountain of beta-short/eta-long normal forms.
Is there an exposition of this written out somewhere? I looked at Abel's thesis, but he basically pulls it right out of a hat at the beginning as far as I can see.
To me, the reify-reflect yoga and the formulation of eta-long normal forms in terms of neutrals and normals are two forms of the same idea, so what one should try to overcome is not one vs the other, but both!
Hmm. I must be more confused than I thought, then, because I feel like I understand the latter but not the former. Supposing I have neutrals and normals, how does it follow that I must have a set of "values" that admits a map to the normals and from the neutrals?
There are some attempts to explain this... Bob tried in a note that is somewhere (I can try to find it).
Mike Shulman said:
To me, the reify-reflect yoga and the formulation of eta-long normal forms in terms of neutrals and normals are two forms of the same idea, so what one should try to overcome is not one vs the other, but both!
Hmm. I must be more confused than I thought, then, because I feel like I understand the latter but not the former. Supposing I have neutrals and normals, how does it follow that I must have a set of "values" that admits a map to the normals and from the neutrals?
Again, I am not speaking of a formal correspondence. I am speaking of a scientific correspondence in the sense that the structure of the Tait reify/reflect configuration is determined by the structure of normals and neutrals, and vice versa. A good way to see this in action would be to change one of the two, and see that an analogous change has to be made on the other side. (Look at, for instance, the way that the change in structure of the normals and neutrals of cubical type theory necessitated a change in the structure of the reflection map --- although, at times while I was inventing this, it felt like the force was coming from the other direction! Of course, it was coming from both directions.)
To reiterate, I feel that you will not be satisfied by these explanations --- because you are looking for a theory that has been completed and then freeze-dried, but that is just not the state of the science. So what we have are experts with a variety of reliable intuitions doing dark magic, but there is some rhyme and reason to it.
I think it would be really cool if you can figure out what we are talking about to the point that you would be able to do a better job than us in finding the mathematics behind it.
But one place to start, in terms of understanding why I feel they are reflections of the same concept, is to view the embedding of neutrals into normals as essentially a version of the reflection map that has been "frozen".
Okay, I'll chew on that too. I really shouldn't be spending time working on this myself, but maybe something will pop up unlooked for, or maybe some other clever person will figure it all out.
Thanks for mentioning realizability anyway -- I already feel like that may help me a lot in understanding what's going on in naive NbE.
Mike Shulman said:
Okay, I'll chew on that too. I really shouldn't be spending time working on this myself, but maybe something will pop up unlooked for, or maybe some other clever person will figure it all out.
Sounds good, and good luck! I'm also working on a different collaboration related to these questions with someone who is a better category theorist than myself, so maybe we will find out some answers that would interest you :)
It's important to know that reify/reflect and quote/eval are intimately related. A nice paper by Danvy and Malmjaer Intensions and Extensions in a Reflective Tower gives a very PL-oriented introduction. This has led to nice work on Collapsing Towers of Interpreters that let you have nested languages and 'flatten' them (I wonder if languages form a monad?).
I do derive some satisfaction that the implementation of the latter uses tagless-final to achieve stage-polymorphism.
NbE and partial evaluation are intimately related. Where NbE 'wins' is that it modifies the language in a semantically non-trivial way (normal vs neutral forms), whereas partial evaluation limits itself largely to binding-stage annotations. Only when you really do quite a bit more, as they do in the FREX project (free extensions, yay!) does partial evaluation become powerful again. There you can actually leverage much more of the equational theory of your language directly for compile-time optimizations. But it's tricky - even though "free extensions" always exist classically, they actually seldom exist constructively, so that one must apply a lot of creativity in practice. (This is where cubical techniques might be very interesting, as many of the interesting "free extensions" do indeed exist as QITs.)
That all sounds interesting, but it uses a lot of words that I don't understand.
I don't quite want to say "that was the point"... but I did want to indicate that there is quite a lot to this thread. Logicians have looked at quotation (and quasi-quotation) for a long time, in particular by Quine in the 1940s. PLs have had quote and eval since the early LISPs (so probably around 1960). That you can do really wild stuff with them was first shown by Futamura in 1971.
You can think of "towers of interpreters" as 'internal language of internal language of internal language of..." n times. Or you can think of it the other way around too: an internal category in an internal category in ... n times.
Once you start exploring those waters, you start to see that it's related to things you do know a lot about: n-theories, type theory 'eating itself' and so on. (I got here the other way around, starting with writing lots of code that does fancy stuff, then trying to find decent theory that explains that it's not just a fluke.)
Thanks for these links @Jacques Carette! There is a lot of stuff, especially in the orbit of Danvy and also Filinski, that I would like to better understand.
Nice entry points to Danvy & Filinski 's ideas are often in papers of Kiselyov and Shan, and also Sabry. If you want something gentler - though you're perhaps one of the few who can read Filinski directly and understand it.
Jacques Carette said:
Nice entry points to Danvy & Filinski 's ideas are often in papers of Kiselyov and Shan, and also Sabry. If you want something gentler - though you're perhaps one of the few who can read Filinski directly and understand it.
Thanks for the pointer :) I don't know specifically about Filinski, but when there is some intimidating literature I have usually not found that I can attack it directly haha
Jonathan Sterling said:
Mike Shulman said:
So you need a canonicity algorithm for the metatheory?
Well, as I pointed out, an alternative is to use a realizability topos as your metalanguage. But basically yes.
This confuses me: extracting algorithms from constructive proofs certainly does not require knowing that the proof terms normalize; simply that they have a computational interpretation! Certainly they might loop (say if your constructive theory is inconsistent :)
@Cody Roux Well, when I say I want an algorithm, I mean I want a terminating algorithm. So if you prove in a constructive metatheory that there exists a normalization function, and when I ask "what's the algorithm" you say that there's automatically an algorithm because the function was constructed in a constructive metatheory, don't you need to know that your constructive metatheory is consistent and canonizing in order for your proof to give me a terminating algorithm?
Jacques Carette said:
I don't quite want to say "that was the point"... but I did want to indicate that there is quite a lot to this thread.
Sure, I don't doubt it. But did you mean to say that any of those links actually answers my question?
Mike Shulman said:
But did you mean to say that any of those links actually answers my question?
About NbG vs NbE directly? No. I wanted to provide wider context, because there might actually be easier answers when the question is seen more broadly.
@Mike Shulman There's a difference between a terminating algorithm and a provably terminating algorithm. In the first case, you just need to "believe" in your constructive meta-theory, in the second you need to have some canonicity proof, which often requires assuming 1-consistency of your constructive-metatheory (or a classical version of it, actually).
But the whole point of proving that a type theory satisfies normalization is to have a provably terminating algorithm, isn't it?
Yes, of course, my point is just this: The step of extracting the normalization algorithm from the proof is separate from the proof itself, and can be done with a weak meta-theory. E.g. you can define a realizability interpretation for in some tiny theory, say , and in prove that if then . Of course does not know whether this realizer halts.
There is an an analogous situation for these normalization arguments.
Alex Gryzlov has marked this topic as resolved.
Alex Gryzlov has marked this topic as unresolved.
Peter Arndt has marked this topic as resolved.
Peter Arndt has marked this topic as unresolved.
Perhaps a more conventional (for the CT readership) presentation of the result and its proof would be helpful?
I found the Sterling-Agiuli paper (https://arxiv.org/abs/2101.11479) quite difficult to read, despite being well-acquainted with the system under consideration, the property asserted to hold for it, and the proposed method of proof. Perhaps the problem was that it was written for a different readership?
I see that Jon will give a talk about it at the upcoming CT - that would be a good opportunity to reformulate things in terms more familiar to categorical logicians. It is an important result (and method of proof) that deserves to be known outside the narrow range of specialists in type theory who can follow the current presentation.
Hi @Steve Awodey , thanks for your feedback! At CT I am only being given 10 minutes, so that will not be the hoped-for opportunity to communicate this result to category theorists. However, I would be very happy to discuss it with you and perhaps we can find a way to present the ideas that would be easier to understand for categorical logicians.
Oh no - 10 minutes is certainly not going to do it! That's barely enough time to state the result.
Yes, let's find another format to present and discuss it (not just with me).
Or indeed, in this very thread! I'm more psyched about understanding it as a CS dummy, which might be a bit more in line with answering @Mike Shulman 's original question, since usually CS results have pretty clear computational content.
As a side note, here's a really fun paper on extracting the computational content from a normalization proof: https://hal.archives-ouvertes.fr/file/index/docid/150904/filename/snextr05.pdf