Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: learning: questions

Topic: "probabilistic" algebraic geometry?

David Egolf (Dec 13 2024 at 01:56):

I've been working on learning some of the introductory ideas of algebraic geometry. To my understanding, in this context there is a duality between algebra and geometry:

the larger an ideal of polynomials...
...the smaller the geometric region where all the polynomials in that ideal vanish.

In the extreme, there is a correspondence between maximal ideals (large and algebraic) and "smallest points" (small and geometric).

Intuitively, this seems like a manifestation of the duality between information and ambiguity. The more one knows about something, or the more that one observes something, the less ambiguity remains regarding what that something can possibly be. So if I pick some mystery geometric region and gradually tell you more and more polynomials that vanish on that region, then you will be able to gradually sharpen your estimate of my chosen mystery region.

Because I care about medical imaging, I care about this duality between information and ambiguity in the case where the information provided is imperfect. In this case, one's estimates of what is present are also imperfect. Motivated by this, I'm curious if algebraic geometry has been developed in the direction of exploring the duality between algebraic and geometric structures in the case where the algebra is "probabilistic" in some sense.

As an initial guess, perhaps one could try to define a probabilistic notion of a polynomial. Instead of being able to assess if a polynomial is zero or not at a point in question, we would instead be given the ability to assess the probability that a "probabilistic polynomial" is zero at some point in question.

To summarize, I have two specific questions:

Has algebraic geometry been developed in the "probabilistic" direction of exploring the duality between "probabilistic" algebra and geometry, for some definition of "probabilistic"?
What might be a notion of "probabilistic polynomial" that could allow one to mimic some of the most basic constructions or patterns of algebraic geometry? For example, I'd presumably want the "probabilistic polynomials" on some space to organize to form a ring.

John Baez (Dec 13 2024 at 02:23):

I'd be really interested if anyone has done something like this, but I feel you're stretching algebraic geometry in ways it doesn't want to go. Since you mention "information" and "probability", I'd be tempted instead to advertise information theory as the power tool for questions like yours.

Given two random variables $X$ and $Y$ , we can mathematically define three quantities:

$H(X)$ : the entropy of $X$ . Very roughly speaking this is how much you don't know concerning the value of $X$ .
$I(X;Y)$ : the mutual information of $Y$ and $X$ . Very roughly speaking this is how much information, on average, knowing the value of $Y$ would tell you about the value of $X$ .
$H(X|Y)$ : the conditional entropy of $X$ given $Y$ . Very roughly speaking this is how much you don't know about $X$ , on average, after you know the value of $Y$ .

Then there's an equation

$H(X|Y) = H(X) - I(X;Y)$

This says how much learning the value of $Y$ reduces how much you don't know concerning the value of $X$ , on average. That's how I'd be tempted to formalize the "duality" you are sensing.

It takes time to understand these concepts, but anyone who wants to understand the universe needs to learn about them - and they are very general and beautiful. Luckily they are explained on Wikipedia (at the links), and even better here:

Cover and Thomas, Elements of Information Theory.

The equation I wrote down is equation (2.39) in this book.

David Egolf (Dec 13 2024 at 02:34):

Thanks for reminding me of those concepts! Back in engineering grad school, I spent a little bit of time using these concepts to try and think about ultrasound imaging. I even spent some time with that book by Cover and Thomas. Ultimately other work was progressing towards publication more quickly and so we abandoned the idea. But it was interesting stuff! It might be fun to go back to it, now that I know some more math...

(We were specifically playing around with these concepts in simulation, trying to see if it was possible to optimize the length of an ultrasound pulse in 1D, given some statistical assumptions about the medium we're transmitting it through. (The longer the transmitted pulse the stronger your resulting observed signal and so the more robust you are to noise, but also the more autocorrelation crosstalk you get). I forget exactly what we were optimizing though... presumably we were trying to minimize the conditional entropy regarding the target given our resulting observations!?)

John Baez (Dec 13 2024 at 02:44):

Btw, if you ever want to see how entropy emerges naturally from category theory, there's an explanation in Tom's free book:

Tom Leinster, Entropy and Diversity: the Axiomatic Approach.

And I have my own free book on entropy in physics. There's remarkably little overlap in these three books, and a lot of basic stuff not covered by any of them, but Cover and Thomas is the one you want for the issues I was just mentioning.

Joe Moeller (Dec 13 2024 at 21:59):

I'll mention algebraic statistics without thinking too hard if it's truly an answer to the question.

John Baez (Dec 13 2024 at 22:13):

The section "Recent work using commutative algebra and algebraic geometry" might help @David Egolf come up with a good answer to his question, but I'm afraid it will take work: it's certainly not handing out anything on a silver platter.

Simon Burton (Dec 14 2024 at 13:11):

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

David Egolf (Dec 15 2024 at 17:55):

Joe Moeller said:

I'll mention algebraic statistics without thinking too hard if it's truly an answer to the question.

Thanks for linking that page! On that page, there are several things that look interesting to me. For example, from that article I found a paper by David Mumford called "Pattern Theory: The Mathematics of Perception".

David Egolf (Dec 15 2024 at 18:06):

The section "Recent work using commutative algebra and algebraic geometry" is also interesting, and probably relevant to what I asked about in this thread.

Roughly, the idea sketched in that section appears to me to be this: If we have a discrete probability distribution determined by some unknown parameter, then at least in some cases the individual probability values in each possible distribution are those that satisfy some polynomial equation. So we can then study our collection of possible probability distributions by studying where a certain polynomial goes to zero - which can presumably be done using algebraic geometry.

David Egolf (Dec 15 2024 at 18:07):

In this approach, it appears that we construct (standard) polynomials of interest from some probabilistic thing of interest. That's not a perfect match with what I was wondering about in this thread, but it's still interesting.

David Egolf (Dec 15 2024 at 18:35):

Simon Burton said:

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

This also looks quite interesting! I'll plan to leave a lengthier comment here once I've spent some time looking at this paper.

John Baez (Dec 15 2024 at 22:56):

A bunch of people have written about how the tropical rig is the $T \to 0$ limit of a rig that's important in statistical mechanics at temperature $T$ . I first learned about this from James Dolan and I could easily bloviate about it at length. I wrote a short version here:

The space of physical frameworks (part 1).

John Baez (Dec 15 2024 at 23:12):

But one really broad lesson is that a lot of algebraic geometry can be generalized from commutative rings to commutative rigs and even more general commutative Lawvere theories - Lawvere theories where all the operations "commute" in a suitable sense. (It takes a while to figure out what it means for a 3-ary operation and a 5-ary operation to "commute", but the answer is visually satisfying.) Nikolai Durov has a mammoth book generalizing lots of algebraic geometry (e.g. schemes) from commutative rings to commutative Lawvere theories, but part of why this book is so huge is that he assumes the reader knows little of category theory, and proceeds at a slow pace.

Rémy Tuyéras (Dec 16 2024 at 13:11):

In cryptography, polynomials are used to encrypt data. More specifically, cryptographers use ideals to introduce noise (randomness) into the encrypted data. To draw an analogy with your problem, @David Egolf, encrypted data can be compared to raw data awaiting analysis.

Below, I will denote by $m$ some valuable information that one wants to discover. In the context of encryption, $m$ is a message that an attacker wants to discover, but maybe in your case @David Egolf, $m$ is some information that you want to discover through some analysis (just to give some ideas here).

Now, the idea is to have a polynomial $e_m$ that encrypts/hides the information $m$ . In general, this polynomial will be considered modulo a certain polynomial $q$ such that it makes sense to denote $e_m' := e_m \,(\mathsf{mod}\,q)$ . An important detail to keep in mind here is that the inequality $e_m' \neq e_m$ will likely hold with high probability. However, we assume that $m$ is small enough to have $m=m \,(\mathsf{mod}\,q)$ .

The polynomial $e_m$ is usually encoded as a sum $a_m+b$ such that $m = a_m \,(\mathsf{mod}\,q)$ . This means that if you know $b$ , then you can discover $m = (e_m-b) \,(\mathsf{mod}\,q)$ using the subtraction $e_m'-b \,(\mathsf{mod}\,q)$ . In other words, the quantity $b$ will likely be the key focus of your research analysis in order to learn $m$ .

Still, encryption doesn’t stop there. The encryption $e_m$ is usually hidden by an extra polynomial $u$ (this is the noise), which would belong to an ideal of the form $(p)$ where $p$ is some other polynomial. In other words, we have $e_m = a_m+b+u$ where $u \,(\mathsf{mod}\,p) = 0$ .

In practice, we also assume that $m$ is small enough such that $m = m \,(\mathsf{mod}\,p)$ .

Depending on the size of the noise $u$ , two scenarios can occur:

If the noise $u$ is small enough, we have $u = u \,(\mathsf{mod}\,q)$ , and the information $m$ can be recovered using: $((e_m-b)\,(\mathsf{mod}\,q))\,(\mathsf{mod}\,p) = m$
If the noise $u$ is too large, we have $u \neq u \,(\mathsf{mod}\,q)$ . In this case, the data is probably too noisy to be analyzed because we would have: $((e_m-b)\,(\mathsf{mod}\,q))\,(\mathsf{mod}\,p) \neq m$

Rémy Tuyéras (Dec 16 2024 at 13:24):

A few remarks:

the noise $u$ makes it harder to learn $b$ when given $e_m' = e_m\,(\mathsf{mod}\,q)$ ;
you are only presented with $e_m'$ and not $e_m$ , which means that you have to remember that you have an unkown extra term $q\cdot v$ as follows: $e_m = a_m+b+u-q\cdot v$
if you use modulo $p$ on $e'_m$ to eliminate the noise $u$ , you still have to deal with some "noise" because of the part $q\cdot v$ (assume $p$ and $q$ are at least co-prime)
the part $q\cdot v$ reflects how large $b$ (and $u$ ) are (hence the formalization of noise behind these modulo operations)

JR (Dec 16 2024 at 15:02):

Not exactly on topic but perhaps worth noting that multilayer perceptrons with ReLU activations instantiate tropical polynomial maps. Meanwhile, transformer blocks at zero temperature are also (essentially?) algebraic maps. So large language models and their ilk at zero temperature are distributions formed from tropical polynomials.

David Egolf (Dec 16 2024 at 18:23):

Thanks everyone! All these comments look interesting! It will, however, take me some time to get to them all.

David Egolf (Dec 16 2024 at 18:26):

Simon Burton said:

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

I'm still working on reading this paper, but I already learned something pretty interesting from it! This paper tells me that one can form a "commutative semiring" (which we might also call a "commutative rig") from any distributive lattice. The nLab tells me that any Heyting algebra forms a distributive lattice. And "Sheaves in Geometry and Logic" tells me that if $B$ is any object in a (elementary?) topos $\mathcal{E}$ then the subobjects of $B$ form a Heyting algebra.

David Egolf (Dec 16 2024 at 18:28):

So, any object $B$ in any (elementary?) topos comes with an associated commutative rig, where the elements in the rig are the subobjects of $B$ !

John Baez (Dec 16 2024 at 18:29):

Yes, that's fun. I believe a distributive lattice is the same as a commutative rig where x + x = x and x $\times$ x = x for all x.

John Baez (Dec 16 2024 at 18:32):

The idea is that "or" is like +, but it obeys "x or x = x", and similarly "and" is like $\times$ , but it obeys "x and x = x".

David Egolf (Dec 16 2024 at 18:52):

That is very cool! So presumably we can do some amount of logic with the subobjects of an object of an elementary topos.

David Egolf (Dec 16 2024 at 18:54):

Wrapping back around to algebraic geometry, I guess we use this "logic" when talking about subsets of some set that vanish under different polynomial ideals.

David Egolf (Dec 16 2024 at 18:56):

In that setting, we have some nice relationships. For example, $V(a) \cap V(b) = V(a + b)$ where $V(a)$ is the subset of some set $X$ where some ideal $a$ of polynomials on $X$ vanishes, and $a+b$ is the ideal generated by $a$ and $b$ .

David Egolf (Dec 16 2024 at 18:58):

I wonder if one could try and develop this setup using subobjects of some object in a topos different from $\mathsf{Set}$ . So then we would presumably want this: $V(a) \land V(b) = V(a+b)$ , where $\land$ is the product in the lattice of subobjects of some object $X$ in some topos $\mathcal{E}$ .

David Egolf (Dec 16 2024 at 19:01):

For example, one might do this if one wished to study "curves" where each "curve" is a subgraph or a subpresheaf instead of a subset.

(This idea catches my attention in part because I have the idea that it may be interesting to represent "knowledge states" in the context of imaging as presheaves.)

David Egolf (Dec 16 2024 at 19:15):

Anyways, I should probably spend a lot more time on standard algebraic geometry before trying out something like this!

Madeleine Birchfield (Dec 17 2024 at 00:06):

John Baez said:

Yes, that's fun. I believe a distributive lattice is the same as a commutative rig where x + x = x and x $\times$ x = x for all x.

John Baez said:

The idea is that "or" is like +, but it obeys "x or x = x", and similarly "and" is like $\times$ , but it obeys "x and x = x".

Discussion on whether this is true or not has been moved to #learning: questions > commutative rigs, distributive lattices, etc.

However I think people in that thread have come up with a commutative rig where $x + x = x$ and $x \times x = x$ for all $x$ but which is not a distributive lattice. Apparently one needs to also have the condition that $1$ is absorbing for $+$ , which then automatically implies that $x + x = x$ .

David Egolf (Dec 17 2024 at 19:28):

I'm still working on reading the above paper "The Generalized Distributive Law". It's slow going because I keep getting interested in spin-off thoughts :laughing:.

So far I don't yet see how the tropical rig is important in this paper - I'm still looking forward to getting to that.

David Egolf (Dec 17 2024 at 19:29):

A couple interesting tidbits from the paper so far, though:

The authors point out that in some cases a distributive law can result in significant computational savings. $a(b+c) = ab+ac$ gives us two ways to compute the same thing - and the number of multiplications is reduced on the left-side.

David Egolf (Dec 17 2024 at 19:30):

(I suppose a similar thing potentially holds true for any equation that holds in an algebraic structure).

David Egolf (Dec 17 2024 at 19:31):

The authors have introduced the concept of a "marginalizing" in a more general setting than I was familiar with. Marginalizing is important when reasoning with probabilities, so I look forward to seeing where this goes!

John Baez (Dec 17 2024 at 19:35):

Re distributive laws between monads: they are directed, so for example if you have a rig freely generated by some elements $x_i$ you can write any element of the rig as sum of products of the $x_i$ , but not as a product of sums of the $x_i$ .

John Baez (Dec 17 2024 at 19:40):

In other words, "expanding" a product of sums is a systematic process, but "simplifying" a sum of products is not so straightforward.

David Egolf (Dec 18 2024 at 19:26):

John Baez said:

In other words, "expanding" a product of sums is a systematic process, but "simplifying" a sum of products is not so straightforward.

That makes sense! With regards to saving on computation, we can imagine that $a(b+c+d)$ is faster to compute (due to having fewer multiplications) compared to $ab+ac+ad$ . And it is very easy to rewrite $a(b+c+d)$ as $ab+ac+ad$ , but potentially much trickier to rewrite $ab+ac+ad$ as $a(b+c+d)$ . So if we want to save on computation, we need to work for it!

David Egolf (Jan 05 2025 at 20:27):

I want to work a bit on this thread again. I'm currently working towards learning how the tropical rig can be used in statistical interference, by reading the paper "The Generalized Distributive Law" linked above. (And then I want to take a look at some of the other interesting comments made above! There's so much to learn!)

David Egolf (Jan 05 2025 at 21:13):

I made a little more progress in reading that paper, and thought this was pretty interesting. The authors introduce the "MPF problem", which refers to a situation where one has a product of functions and then wishes to compute a "marginalized" version of it, by adding up all the values that function attains as some input ranges over all possible values.

David Egolf (Jan 05 2025 at 21:14):

This is interesting because various famous and useful things have a related MPF problem. For examples, the paper gives the Hadamard transform and the Fourier transform (over any finite abelian group). But I think that any (finite) convolution or cross-correlation will also give an example!

David Egolf (Jan 05 2025 at 21:15):

(Cross-correlation in particular is of interest to me, because it can play an important role in forming ultrasound images from "coded" transmissions. Basically, we send out a particular pattern of ultrasound and then look for that pattern in the echoes we get back using a cross-correlation.)

David Egolf (Jan 24 2025 at 23:00):

I continue to slowly work through this thread! I'm continuing to read in "The Generalized Distributive Law" because I want to see how the tropical rig can show up in the context of probabilistic reasoning.

The paper gives a interesting example relating to this, which I'll share here.

David Egolf (Jan 24 2025 at 23:03):

Let us imagine that we receive a seven digit binary sequence over some channel. One of several predefined "codes" were transmitted, but there is some probability that each transmitted digit got swapped to the opposite digit (i.e. 0->1 or 1-> 0) during transmission. So our goal is to figure out which code was transmitted, given the (possibly modified) code that we receive.

David Egolf (Jan 24 2025 at 23:04):

Given a transmitted code $x_1x_2 \dots x_7$ , we model the likelihood of having received $y_1y_2 \dots y_7$ as the product of the likelihood of receiving each digit individually: $p(y_1y_2 \dots y_7 | x_1x_2 \dots x_7) = \prod_i p(y_i | x_i)$ . Given the observed $y_1y_2 \dots y_7$ , our goal is to find the $x_1 \dots x_7$ that maximizes this probability.

David Egolf (Jan 24 2025 at 23:12):

To think about how to maximize the above, we can equivalently think about how to minimize $-\log(\prod_i p(y_i|x_i)) = \sum_i -\log(p(y_i|x_i))$ . We can then consider attempting to minimize this quantity with respect to only some of the $x_i$ , so that we get a function in the remaining variables.

For example, we might end up with this function:
$(x_2, x_6, x_7) \mapsto \min_{x_1,x_3,x_4,x_5}(\sum_i -\log(p(y_i|x_i)))$

David Egolf (Jan 24 2025 at 23:17):

We notice that this function involves the "addition" (minimum) and "multiplication" (regular addition) of the tropical rig.

David Egolf (Jan 24 2025 at 23:33):

(Note that for this optimization problem to be interesting, we have to restrict the binary sequences we are searching over to the valid codes. The paper achieves this by using parity check functions.

So, we probably instead want to minimize the following with respect to all the $x_i$ . Namely, we want to minimize $\sum_i -\log(p(y_i|x_i)) + \sum_j \chi_j(x_1, x_2,\dots, x_7)$ . Here each $\chi_j$ checks if a given sequence passes a particular parity check, outputting $\infty$ if not and $0$ if it is valid. So the $\chi_j$ together check if a code is valid. But at any rate, we are still seeing the tropical rig operations show up here.)

JR (Jan 25 2025 at 01:54):

Jumping in without checking history: have you seen https://www.pnas.org/doi/10.1073/pnas.0406010101

David Egolf (Jan 25 2025 at 03:19):

JR said:

Jumping in without checking history: have you seen https://www.pnas.org/doi/10.1073/pnas.0406010101

Thanks for the link! I have not read that article yet, but I'll plan to take a look at some point! On a related note, I recently found the article "Algebraic, tropical, and fuzzy geometry", which might be relevant too.

David Egolf (Feb 04 2025 at 01:03):

One last thought from "The Generalized Distributive Law": if we have multiple random variables and we only observe a value for some of them, then we are left with a "region" that holds all the still-possible values for the random variables.

Moving on from the (enjoyable and interesting) "The Generalized Distributive Law", I see the next comment above talks about the tropical rig in the context of statistical mechanics.

So I will take a look at this post next: The space of physical frameworks (part 1) !

David Egolf (Feb 05 2025 at 16:54):

That post describes a way to put a rig structure on $(-\infty, \infty]$ given a bijection $f_\beta:(-\infty, \infty] \to [0, \infty)$ . I'm now curious if this works in general!

Let $f:R \to S$ be a bijection and let $S$ be a rig. Can we make $R$ a rig as follows?

David Egolf (Feb 05 2025 at 16:55):

Set $r + r' = f^{-1}(f(r) + f(r'))$
Set $rr' = f^{-1}(f(r)f(r'))$

David Egolf (Feb 05 2025 at 16:57):

Since $f$ is a bijection, the proposed addition and multiplication are well-defined.

David Egolf (Feb 05 2025 at 16:58):

What should the zero element of $R$ be? We can try $0_R = f^{-1}(0_S)$ . Then $r + 0_R = f^{-1}(f(r) + f(0_R)) = f^{-1}(f(r) + 0_S) = f^{-1}(f(r)) = r$ as desired.

David Egolf (Feb 05 2025 at 17:02):

We can also try $1_R = f^{-1}(1_S)$ . Then $r1_R = f^{-1}(f(r)f(1_R)) = f^{-1}(f(r)1_S) = f^{-1}(f(r)) = r$ .

David Egolf (Feb 05 2025 at 17:04):

Next, we check associativity. $(r+r') + r'' = f^{-1}(f(r+r') + f(r''))$ . Since we have $f(r+r') = f(r) + f(r')$ , this becomes $f^{-1}(f(r) + f(r') + f(r'')) = f^{-1}(f(r) + (f(r') + f(r'')) = r + (r' + r'')$ . I think a similar argument should work for multiplication.

David Egolf (Feb 05 2025 at 17:07):

It remains to check distributivity. We have $r(r' + r'') = f^{-1}(f(r)f(r' + r'')) = f^{-1}(f(r)(f(r') + f(r'')) = f^{-1}(f(r)f(r') + f(r)f(r'')) = f^{-1}(f(rr') + f(rr'')) = rr' + rr''$ . Similarly, we should have $(r' + r'')r = r'r + r''r$ .

David Egolf (Feb 05 2025 at 17:09):

So it looks like we can indeed get a rig structure on $R$ in this way! That's pretty cool! I'm now wondering if this is just a special case of a more general strategy for inducing algebraic structures.

David Egolf (Feb 05 2025 at 17:11):

Hmm, I'm guessing that the rig we get $R$ is always isomorphic to the rig we started with $S$ . (I'm guessing that bijective rig morphisms are in fact isomorphisms in the category of rigs). If so, that makes this a bit less exciting.

David Egolf (Feb 05 2025 at 17:14):

However, the article manages to take some kind of limit (of the rigs we get in the above way as our bijection changes in a parametrized way) to get a new rig. That's interesting! I would not have expected that a bunch of isomorphic rigs could "approach" a different rig in some sense.

(The new rig we get is the tropical rig!)

John Baez (Feb 05 2025 at 17:32):

David Egolf said:

Let $f:R \to S$ be a bijection and let $S$ be a rig. Can we make $R$ a rig as follows?

That works not just for rigs but for anything mathematicians might want to make a set into: groups, fields, topological spaces, quasitriangular Hopf algebras over $\mathbb{F}_p$ , you name it.

Hmm, I'm guessing that the rig we get $R$ is always isomorphic to the rig we started with, $S$ .

Yes, that's how it works. We call this "transporting a structure along a bijection".

Most mathematicians take these general facts for granted after having checked them in 5 or 6 examples. But stating these general facts precisely and making them into a theorem is a job for category theorists!

I believe these fact amount to saying that if $\mathsf{C}$ is any 'reasonable" category of sets with extra structure, the forgetful functor $U: \mathsf{C} \to \mathsf{Set}$ is an [[isofibration]]. So then the challenge is to explain why forgetful functors tend to be isofibrations. I could say more....

However, the article manages to take some kind of limit (of the rigs we get in the above way as our bijection changes in a parametrized way) to get a new rig. That's interesting!

Yes, that's the only interesting thing about this business.

I would not have expected that a bunch of isomorphic rigs could "approach" a different rig in some sense.

This type of thing happens a lot, and there's a branch of math called deformation theory that studies it. In special relativity, for example, it doesn't really matter what the speed of light $c$ is: for any value of $c$ you get an isomorphic version of special relativity. But when you take the limit $c \to \infty$ you get a nonisomorphic theory, namely Newtonian mechanics.

It was noticed by Wigner in 1953 (if not sooner by someone else) that new theories of physics often have older theories as limits in this sense. All these theories are actually associated to mathematical structures - for example, most theories have symmetry groups associated to them. So Wigner developed a theory of how to take a limit of a bunch of isomorphic groups and get a different group.

I think my blog post listed a bunch of physical theories that reduce to other theories in some limit.

David Egolf (Feb 05 2025 at 19:59):

It's very cool to learn that deformation theory exists. It appears that it's closely related to algebraic geometry, which currently makes it hard for me to understand many of the examples. But the basic idea of something different somehow happening in a limit is pretty magical (in a good way)!

David Egolf (Feb 05 2025 at 20:03):

I believe these fact amount to saying that if $\mathsf{C}$ is any 'reasonable" category of sets with extra structure, the forgetful functor $U: \mathsf{C} \to \mathsf{Set}$ is an [[isofibration]]. So then the challenge is to explain why forgetful functors tend to be isofibrations. I could say more....

"Isofibration" is one of those words I've wanted to understand for a while. This might provide a great motivating example! I see the nLab says that an isofibration is roughly a functor $F:E \to B$ such that every isomorphism $f$ in $B$ can be "lifted" to an isomorphism in $E$ .

In our case, we have a forgetful functor $U:\mathsf{C} \to \mathsf{Set}$ , and we roughly want to be able to induce an isomorphism between objects in $\mathsf{C}$ given an isomorphism between their underlying sets. So if I have a structure $c$ with underlying set $U(c) \cong s$ for some set $s$ , then we hope we can lift this bijection to an isomorphism in $\mathsf{C}$ . We want to lift this $\mathsf{Set}$ -isomorphism to a $\mathsf{C}$ -isomorphism between $c$ and some preimage of $s$ under $U$ .

David Egolf (Feb 05 2025 at 20:08):

We could ask: when is the forgetful functor $U:C^T \to C$ for a monad $(T, \eta, \mu)$ an isofibration? Here $C^T$ is the Eilenberg-Moore category of the monad involving $T:C \to C$ , which I think can often be thought of as a category of "algebraic structures".

Kevin Carlson (Feb 05 2025 at 20:20):

If you're comfortable with the definition of $C^T,$ you might try to think through this in that generality for yourself!

John Baez (Feb 05 2025 at 20:47):

David Egolf said:

It's very cool to learn that deformation theory exists. It appears that it's closely related to algebraic geometry, which currently makes it hard for me to understand many of the examples.

That's just because algebraic geometers like studying things in continously varying families depending on some parameters, like ellipses

$x^2/a^2 + y^2/b^2 = 1$

which depend on two parameters $a,b$ . So, given anything they enjoy seeing how they can fit it into a family and 'deform' it, like taking a circle and squashing it into an ellipse. So naturally they infect deformation theory with their jargon and make it hard for other people to understand. :upside_down:

But physicists and other people also use deformation theory, often in blissful ignorance of the power tools of algebraic geometry. So there's stuff to read that doesn't assume algebraic geometry, but usually it's about some particular case of deformation theory.

But the basic idea of something different somehow happening in a limit is pretty magical (in a good way).

Indeed! This is how you get special relativity from general relativity, and classical mechanics from quantum mechanic and (I show in blog series) classical thermodynamics from statistical mechanics.

John Baez (Feb 05 2025 at 20:53):

David Egolf said:

We could ask: when is the forgetful functor $U:C^T \to C$ for a monad $(T, \eta, \mu)$ an isofibration? Here $C^T$ is the Eilenberg-Moore category of the monad involving $T:C \to C$ , which I think can often be thought of as a category of "algebraic structures".

As Kevin hinted, this is an excellent and not very hard question. Maybe the hardest part is remembering the definition of an isofibration! :wink:

John Baez (Feb 05 2025 at 21:27):

By the way, I tend to throw you lots of 'exercises' since you seem to have an appetite for them, but if you ever just want to know the answer (like to your question above), just ask. There's a limit to how many exercises anyone can do!

David Egolf (Feb 06 2025 at 19:12):

I like exercises/puzzles - they help me understand things! In this case, I'm feeling a bit intimidated by all the fancy words. But it is encouraging to know that this isn't supposed to be too hard... Let me see what I can come up with.

John Baez (Feb 06 2025 at 19:31):

I will bite my tongue, then.

David Egolf (Feb 06 2025 at 20:00):

I think I figured it out! Let $f:U(a) \cong c$ in $C$ be an isomorphism from $U(a)$ to $c$ . Here $U:C^T \to C$ is our forgetful functor. I believe that $U$ is always(!) an isofibration.

David Egolf (Feb 06 2025 at 20:00):

To show this, we need to find an isomorphism in $C^T$ that maps down to $f$ by $U$ . Such an isomorphism would be a commutative square of this form, where $f$ and $f'$ are both isomorphisms:
commutative square

David Egolf (Feb 06 2025 at 20:00):

If we set $f' = T(f)$ , we know that $f'$ is an isomorphism because functors preserve isomorphisms. And we can make the diagram commute by setting $b = f \circ a \circ T(f)^{-1}$ . This isomorphism maps down to $f$ under $U$ - as $U$ just grabs the lower part of this diagram - so I think $U$ is an isofibration!

David Egolf (Feb 06 2025 at 20:01):

(In the above example, we might think of $a:T(U(a)) \to U(a)$ as a group, $U(a)$ as its underlying set, and $c$ as a set.)

John Baez (Feb 06 2025 at 20:03):

Yes, that sounds right! Great! The formula $f \circ a \circ T(f)^{-1}$ generalizes how you might transport any binary operation $O: X^2 \to X$ along a bijection $f: X \to X'$ to get a new binary operation $O': X'^2 \to X'$ :

$O'(x,y) = f(O(f^{-1}(x), f^{-1}(y))$

John Baez (Feb 06 2025 at 20:09):

So this shows that when I said any 'reasonable' structure on a set (or indeed an object in any category) can be transported along bijections (or isomorphisms), this includes all structures that come from monads.

There are many 'reasonable' structures on sets that don't come from monads, like 'being a topological space'. So one might want an even more general theorem.

But let me instead describe an 'unreasonable' structure. Define a 'goofy structure' on sets like this:

If the set $X$ is $\{1,2,3,4,5,6,7,8\}$ , a goofy structure on $X$ is a way of making it into a group.

If the set $X$ is anything not equal to $\{1,2,3,4,5,6,7,8\}$ , a goofy structure is a way of making it into a field.

David Egolf (Feb 07 2025 at 18:30):

The "goofy structure" I suppose is "unreasonable" because the goofy structures we can put on a set can be essentially different even for isomorphic sets. So this concept is violating the principle of equivalence.

David Egolf (Feb 07 2025 at 18:31):

One might wonder how one can detect if a notion of structure is "reasonable" in the sense that it doesn't lead to essentially different structures on isomorphic objects.

Kevin Carlson (Feb 07 2025 at 18:33):

Well...how about, if the forgetful functor from the category of those structures is an isofibration? :grinning:

David Egolf (Feb 07 2025 at 18:34):

Yes! I was just about to propose something like that! That seems pretty reasonable to me.

Kevin Carlson (Feb 07 2025 at 18:35):

I'd say that's exactly the conceptual content of the notion of isofibration, so I think we're on the same page.

David Egolf (Feb 07 2025 at 18:36):

We could then say that a category of (reasonable) structures on the objects in a category $C$ is a category $D$ equipped with an isofibration $U:D \to C$ .

Kevin Carlson (Feb 07 2025 at 18:37):

I might just complicate that with the "stuff, structure, property" triumvirate--if $U$ is not faithful, I probably wouldn't want to call $D$ a category of structures on $C.$ But (one wonders!) surely there are non-faithful isofibrations.

Mike Shulman (Feb 07 2025 at 19:24):

Yes, e.g. the projection $\rm Set\times Set \to Set$ (either one) is an isofibration.

Kevin Carlson (Feb 07 2025 at 19:25):

Of course, that was just meant to be a temptation to David to think about it.

Kevin Carlson (Feb 07 2025 at 19:26):

Indeed, every functor is equivalent (in an appropriate sense) to an isofibration; one might enjoy learning about the canonical model structure on the category of small categories in this light.

Mike Shulman (Feb 07 2025 at 19:26):

Oh, sorry.

Kevin Carlson (Feb 07 2025 at 19:26):

No worries, my fault for being coy.

John Baez (Feb 08 2025 at 00:20):

David Egolf said:

The "goofy structure" I suppose is "unreasonable" because the goofy structures we can put on a set can be essentially different even for isomorphic sets. So this concept is violating the principle of equivalence.

Right! But here's an interesting twist: the concept of 'isofibration' itself violates the principle of equivalence. You can have two functors

$F, G : \mathsf{C} \to \mathsf{D}$

that are naturally isomorphic, with one being an isofibration and the other not. Let me see if I can make up an easy example. The identity functor

$1 : \mathsf{FinSet} \to \mathsf{FinSet}$

is an isofibration (since all identity functors are). But it's naturally isomorphic to a functor that sends all $n$ -element sets to the same $n$ -element set. (Yet another puzzle: see why?) This latter functor is clearly not an isofibration.

John Baez (Feb 08 2025 at 00:21):

So if you call the violation of the principle of equivalence "evil", and you think being an isofibration is not evil, i.e. "good", you're in the funny situation that it's evil to say a functor is good!

John Baez (Feb 08 2025 at 00:37):

In fact, every functor can be 'repaired' to get an isofibration, without changing anything important.

More precisely, given a functor

$F : C \to D$

we can make $C$ into a subcategory of a larger category $C'$ , where the inclusion

$i : C \to C'$

is an equivalence, and find an isofibration

$F': C' \to D$

such that

$F \cong F' \circ i$

John Baez (Feb 08 2025 at 00:40):

The idea is that we puff up the category $C$ to get a bigger but equivalent category $C'$ , which allows us to replace the functor $F: C \to D$ by a functor $F': C' \to D$ that's an isofibration, basically because $C'$ has enough isomorphisms to do all the lifting we want in the definition of 'isofibration'.

David Egolf (Feb 09 2025 at 22:53):

John Baez said:

But here's an interesting twist: the concept of 'isofibration' itself violates the principle of equivalence. You can have two functors

$F, G : \mathsf{C} \to \mathsf{D}$

that are naturally isomorphic, with one being an isofibration and the other not. Let me see if I can make up an easy example. The identity functor

$1 : \mathsf{FinSet} \to \mathsf{FinSet}$

is an isofibration (since all identity functors are). But it's naturally isomorphic to a functor that sends all $n$ -element sets to the same $n$ -element set. (Yet another puzzle: see why?) This latter functor is clearly not an isofibration.

That makes sense! (And I think I was able to work out the details of checking that we have a natural isomorphism between two such functors.)

David Egolf (Feb 09 2025 at 23:02):

John Baez said:

The idea is that we puff up the category $C$ to get a bigger but equivalent category $C'$ , which allows us to replace the functor $F: C \to D$ by a functor $F': C' \to D$ that's an isofibration, basically because $C'$ has enough isomorphisms to do all the lifting we want in the definition of 'isofibration'.

Interesting! Intuitively, I guess the repair process creates a version of $F$ that better respects the principle of equivalence, in terms of what is in its image.

John Baez (Feb 10 2025 at 00:25):

I think I was able to work out the details of checking that we have a natural isomorphism between two such functors.

Excellent! It's closely related to the idea of a [[skeleton]]: every category $\mathsf{C}$ is equivalent to a category $\mathsf{C}_0$ where isomorphic objects are equal, and you can set it up so that the functor $\mathsf{C}_0 \to \mathsf{C}$ in this equivalence is the inclusion of a subcategory, gotten by choosing one object from each isomorphism class of objects in $\mathsf{C}$ .

When we do this trick for $\mathsf{C} = \mathsf{FinSet}$ , we usually choose the objects of $\mathsf{C}_0$ to be the ordinals - our 'favorite' $n$ -element sets.