Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: learning: questions

Topic: "probabilistic" algebraic geometry?


view this post on Zulip David Egolf (Dec 13 2024 at 01:56):

I've been working on learning some of the introductory ideas of algebraic geometry. To my understanding, in this context there is a duality between algebra and geometry:

In the extreme, there is a correspondence between maximal ideals (large and algebraic) and "smallest points" (small and geometric).

Intuitively, this seems like a manifestation of the duality between information and ambiguity. The more one knows about something, or the more that one observes something, the less ambiguity remains regarding what that something can possibly be. So if I pick some mystery geometric region and gradually tell you more and more polynomials that vanish on that region, then you will be able to gradually sharpen your estimate of my chosen mystery region.

Because I care about medical imaging, I care about this duality between information and ambiguity in the case where the information provided is imperfect. In this case, one's estimates of what is present are also imperfect. Motivated by this, I'm curious if algebraic geometry has been developed in the direction of exploring the duality between algebraic and geometric structures in the case where the algebra is "probabilistic" in some sense.

As an initial guess, perhaps one could try to define a probabilistic notion of a polynomial. Instead of being able to assess if a polynomial is zero or not at a point in question, we would instead be given the ability to assess the probability that a "probabilistic polynomial" is zero at some point in question.

To summarize, I have two specific questions:

view this post on Zulip John Baez (Dec 13 2024 at 02:23):

I'd be really interested if anyone has done something like this, but I feel you're stretching algebraic geometry in ways it doesn't want to go. Since you mention "information" and "probability", I'd be tempted instead to advertise information theory as the power tool for questions like yours.

Given two random variables XX and YY, we can mathematically define three quantities:

Then there's an equation

H(XY)=H(X)I(X;Y) H(X|Y) = H(X) - I(X;Y)

This says how much learning the value of YY reduces how much you don't know concerning the value of XX, on average. That's how I'd be tempted to formalize the "duality" you are sensing.

It takes time to understand these concepts, but anyone who wants to understand the universe needs to learn about them - and they are very general and beautiful. Luckily they are explained on Wikipedia (at the links), and even better here:

The equation I wrote down is equation (2.39) in this book.

view this post on Zulip David Egolf (Dec 13 2024 at 02:34):

Thanks for reminding me of those concepts! Back in engineering grad school, I spent a little bit of time using these concepts to try and think about ultrasound imaging. I even spent some time with that book by Cover and Thomas. Ultimately other work was progressing towards publication more quickly and so we abandoned the idea. But it was interesting stuff! It might be fun to go back to it, now that I know some more math...

(We were specifically playing around with these concepts in simulation, trying to see if it was possible to optimize the length of an ultrasound pulse in 1D, given some statistical assumptions about the medium we're transmitting it through. (The longer the transmitted pulse the stronger your resulting observed signal and so the more robust you are to noise, but also the more autocorrelation crosstalk you get). I forget exactly what we were optimizing though... presumably we were trying to minimize the conditional entropy regarding the target given our resulting observations!?)

view this post on Zulip John Baez (Dec 13 2024 at 02:44):

Btw, if you ever want to see how entropy emerges naturally from category theory, there's an explanation in Tom's free book:

And I have my own free book on entropy in physics. There's remarkably little overlap in these three books, and a lot of basic stuff not covered by any of them, but Cover and Thomas is the one you want for the issues I was just mentioning.

view this post on Zulip Joe Moeller (Dec 13 2024 at 21:59):

I'll mention algebraic statistics without thinking too hard if it's truly an answer to the question.

view this post on Zulip John Baez (Dec 13 2024 at 22:13):

The section "Recent work using commutative algebra and algebraic geometry" might help @David Egolf come up with a good answer to his question, but I'm afraid it will take work: it's certainly not handing out anything on a silver platter.

view this post on Zulip Simon Burton (Dec 14 2024 at 13:11):

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

view this post on Zulip David Egolf (Dec 15 2024 at 17:55):

Joe Moeller said:

I'll mention algebraic statistics without thinking too hard if it's truly an answer to the question.

Thanks for linking that page! On that page, there are several things that look interesting to me. For example, from that article I found a paper by David Mumford called "Pattern Theory: The Mathematics of Perception".

view this post on Zulip David Egolf (Dec 15 2024 at 18:06):

The section "Recent work using commutative algebra and algebraic geometry" is also interesting, and probably relevant to what I asked about in this thread.

Roughly, the idea sketched in that section appears to me to be this: If we have a discrete probability distribution determined by some unknown parameter, then at least in some cases the individual probability values in each possible distribution are those that satisfy some polynomial equation. So we can then study our collection of possible probability distributions by studying where a certain polynomial goes to zero - which can presumably be done using algebraic geometry.

view this post on Zulip David Egolf (Dec 15 2024 at 18:07):

In this approach, it appears that we construct (standard) polynomials of interest from some probabilistic thing of interest. That's not a perfect match with what I was wondering about in this thread, but it's still interesting.

view this post on Zulip David Egolf (Dec 15 2024 at 18:35):

Simon Burton said:

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

This also looks quite interesting! I'll plan to leave a lengthier comment here once I've spent some time looking at this paper.

view this post on Zulip John Baez (Dec 15 2024 at 22:56):

A bunch of people have written about how the tropical rig is the T0T \to 0 limit of a rig that's important in statistical mechanics at temperature TT. I first learned about this from James Dolan and I could easily bloviate about it at length. I wrote a short version here:

view this post on Zulip John Baez (Dec 15 2024 at 23:12):

But one really broad lesson is that a lot of algebraic geometry can be generalized from commutative rings to commutative rigs and even more general commutative Lawvere theories - Lawvere theories where all the operations "commute" in a suitable sense. (It takes a while to figure out what it means for a 3-ary operation and a 5-ary operation to "commute", but the answer is visually satisfying.) Nikolai Durov has a mammoth book generalizing lots of algebraic geometry (e.g. schemes) from commutative rings to commutative Lawvere theories, but part of why this book is so huge is that he assumes the reader knows little of category theory, and proceeds at a slow pace.

view this post on Zulip Rémy Tuyéras (Dec 16 2024 at 13:11):

In cryptography, polynomials are used to encrypt data. More specifically, cryptographers use ideals to introduce noise (randomness) into the encrypted data. To draw an analogy with your problem, @David Egolf, encrypted data can be compared to raw data awaiting analysis.

Below, I will denote by mm some valuable information that one wants to discover. In the context of encryption, mm is a message that an attacker wants to discover, but maybe in your case @David Egolf, mm is some information that you want to discover through some analysis (just to give some ideas here).

Now, the idea is to have a polynomial eme_m that encrypts/hides the information mm. In general, this polynomial will be considered modulo a certain polynomial qq such that it makes sense to denote em:=em(modq)e_m' := e_m \,(\mathsf{mod}\,q). An important detail to keep in mind here is that the inequality ememe_m' \neq e_m will likely hold with high probability. However, we assume that mm is small enough to have m=m(modq)m=m \,(\mathsf{mod}\,q).

The polynomial eme_m is usually encoded as a sum am+ba_m+b such that m=am(modq)m = a_m \,(\mathsf{mod}\,q). This means that if you know bb, then you can discover m=(emb)(modq)m = (e_m-b) \,(\mathsf{mod}\,q) using the subtraction emb(modq)e_m'-b \,(\mathsf{mod}\,q). In other words, the quantity bb will likely be the key focus of your research analysis in order to learn mm.

Still, encryption doesn’t stop there. The encryption eme_m is usually hidden by an extra polynomial uu (this is the noise), which would belong to an ideal of the form (p)(p) where pp is some other polynomial. In other words, we have em=am+b+ue_m = a_m+b+u where u(modp)=0u \,(\mathsf{mod}\,p) = 0.

In practice, we also assume that mm is small enough such that m=m(modp)m = m \,(\mathsf{mod}\,p).

Depending on the size of the noise uu, two scenarios can occur:

view this post on Zulip Rémy Tuyéras (Dec 16 2024 at 13:24):

A few remarks:

view this post on Zulip JR (Dec 16 2024 at 15:02):

Not exactly on topic but perhaps worth noting that multilayer perceptrons with ReLU activations instantiate tropical polynomial maps. Meanwhile, transformer blocks at zero temperature are also (essentially?) algebraic maps. So large language models and their ilk at zero temperature are distributions formed from tropical polynomials.

view this post on Zulip David Egolf (Dec 16 2024 at 18:23):

Thanks everyone! All these comments look interesting! It will, however, take me some time to get to them all.

view this post on Zulip David Egolf (Dec 16 2024 at 18:26):

Simon Burton said:

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

I'm still working on reading this paper, but I already learned something pretty interesting from it! This paper tells me that one can form a "commutative semiring" (which we might also call a "commutative rig") from any distributive lattice. The nLab tells me that any Heyting algebra forms a distributive lattice. And "Sheaves in Geometry and Logic" tells me that if BB is any object in a (elementary?) topos E\mathcal{E} then the subobjects of BB form a Heyting algebra.

view this post on Zulip David Egolf (Dec 16 2024 at 18:28):

So, any object BB in any (elementary?) topos comes with an associated commutative rig, where the elements in the rig are the subobjects of BB!

view this post on Zulip John Baez (Dec 16 2024 at 18:29):

Yes, that's fun. I believe a distributive lattice is the same as a commutative rig where x + x = x and x ×\times x = x for all x.

view this post on Zulip John Baez (Dec 16 2024 at 18:32):

The idea is that "or" is like +, but it obeys "x or x = x", and similarly "and" is like ×\times, but it obeys "x and x = x".

view this post on Zulip David Egolf (Dec 16 2024 at 18:52):

That is very cool! So presumably we can do some amount of logic with the subobjects of an object of an elementary topos.

view this post on Zulip David Egolf (Dec 16 2024 at 18:54):

Wrapping back around to algebraic geometry, I guess we use this "logic" when talking about subsets of some set that vanish under different polynomial ideals.

view this post on Zulip David Egolf (Dec 16 2024 at 18:56):

In that setting, we have some nice relationships. For example, V(a)V(b)=V(a+b)V(a) \cap V(b) = V(a + b) where V(a)V(a) is the subset of some set XX where some ideal aa of polynomials on XX vanishes, and a+ba+b is the ideal generated by aa and bb.

view this post on Zulip David Egolf (Dec 16 2024 at 18:58):

I wonder if one could try and develop this setup using subobjects of some object in a topos different from Set\mathsf{Set}. So then we would presumably want this: V(a)V(b)=V(a+b)V(a) \land V(b) = V(a+b), where \land is the product in the lattice of subobjects of some object XX in some topos E\mathcal{E}.

view this post on Zulip David Egolf (Dec 16 2024 at 19:01):

For example, one might do this if one wished to study "curves" where each "curve" is a subgraph or a subpresheaf instead of a subset.

(This idea catches my attention in part because I have the idea that it may be interesting to represent "knowledge states" in the context of imaging as presheaves.)

view this post on Zulip David Egolf (Dec 16 2024 at 19:15):

Anyways, I should probably spend a lot more time on standard algebraic geometry before trying out something like this!

view this post on Zulip Madeleine Birchfield (Dec 17 2024 at 00:06):

John Baez said:

Yes, that's fun. I believe a distributive lattice is the same as a commutative rig where x + x = x and x ×\times x = x for all x.

John Baez said:

The idea is that "or" is like +, but it obeys "x or x = x", and similarly "and" is like ×\times, but it obeys "x and x = x".

Discussion on whether this is true or not has been moved to #learning: questions > commutative rigs, distributive lattices, etc.

However I think people in that thread have come up with a commutative rig where x+x=xx + x = x and x×x=xx \times x = x for all xx but which is not a distributive lattice. Apparently one needs to also have the condition that 11 is absorbing for ++, which then automatically implies that x+x=xx + x = x.

view this post on Zulip David Egolf (Dec 17 2024 at 19:28):

I'm still working on reading the above paper "The Generalized Distributive Law". It's slow going because I keep getting interested in spin-off thoughts :laughing:.

So far I don't yet see how the tropical rig is important in this paper - I'm still looking forward to getting to that.

view this post on Zulip David Egolf (Dec 17 2024 at 19:29):

A couple interesting tidbits from the paper so far, though:

view this post on Zulip David Egolf (Dec 17 2024 at 19:30):

(I suppose a similar thing potentially holds true for any equation that holds in an algebraic structure).

view this post on Zulip David Egolf (Dec 17 2024 at 19:31):

view this post on Zulip John Baez (Dec 17 2024 at 19:35):

Re distributive laws between monads: they are directed, so for example if you have a rig freely generated by some elements xix_i you can write any element of the rig as sum of products of the xix_i, but not as a product of sums of the xix_i.

view this post on Zulip John Baez (Dec 17 2024 at 19:40):

In other words, "expanding" a product of sums is a systematic process, but "simplifying" a sum of products is not so straightforward.

view this post on Zulip David Egolf (Dec 18 2024 at 19:26):

John Baez said:

In other words, "expanding" a product of sums is a systematic process, but "simplifying" a sum of products is not so straightforward.

That makes sense! With regards to saving on computation, we can imagine that a(b+c+d)a(b+c+d) is faster to compute (due to having fewer multiplications) compared to ab+ac+adab+ac+ad. And it is very easy to rewrite a(b+c+d)a(b+c+d) as ab+ac+adab+ac+ad, but potentially much trickier to rewrite ab+ac+adab+ac+ad as a(b+c+d)a(b+c+d). So if we want to save on computation, we need to work for it!

view this post on Zulip David Egolf (Jan 05 2025 at 20:27):

I want to work a bit on this thread again. I'm currently working towards learning how the tropical rig can be used in statistical interference, by reading the paper "The Generalized Distributive Law" linked above. (And then I want to take a look at some of the other interesting comments made above! There's so much to learn!)

view this post on Zulip David Egolf (Jan 05 2025 at 21:13):

I made a little more progress in reading that paper, and thought this was pretty interesting. The authors introduce the "MPF problem", which refers to a situation where one has a product of functions and then wishes to compute a "marginalized" version of it, by adding up all the values that function attains as some input ranges over all possible values.

view this post on Zulip David Egolf (Jan 05 2025 at 21:14):

This is interesting because various famous and useful things have a related MPF problem. For examples, the paper gives the Hadamard transform and the Fourier transform (over any finite abelian group). But I think that any (finite) convolution or cross-correlation will also give an example!

view this post on Zulip David Egolf (Jan 05 2025 at 21:15):

(Cross-correlation in particular is of interest to me, because it can play an important role in forming ultrasound images from "coded" transmissions. Basically, we send out a particular pattern of ultrasound and then look for that pattern in the echoes we get back using a cross-correlation.)

view this post on Zulip David Egolf (Jan 24 2025 at 23:00):

I continue to slowly work through this thread! I'm continuing to read in "The Generalized Distributive Law" because I want to see how the tropical rig can show up in the context of probabilistic reasoning.

The paper gives a interesting example relating to this, which I'll share here.

view this post on Zulip David Egolf (Jan 24 2025 at 23:03):

Let us imagine that we receive a seven digit binary sequence over some channel. One of several predefined "codes" were transmitted, but there is some probability that each transmitted digit got swapped to the opposite digit (i.e. 0->1 or 1-> 0) during transmission. So our goal is to figure out which code was transmitted, given the (possibly modified) code that we receive.

view this post on Zulip David Egolf (Jan 24 2025 at 23:04):

Given a transmitted code x1x2x7x_1x_2 \dots x_7, we model the likelihood of having received y1y2y7y_1y_2 \dots y_7 as the product of the likelihood of receiving each digit individually: p(y1y2y7x1x2x7)=ip(yixi)p(y_1y_2 \dots y_7 | x_1x_2 \dots x_7) = \prod_i p(y_i | x_i). Given the observed y1y2y7y_1y_2 \dots y_7, our goal is to find the x1x7x_1 \dots x_7 that maximizes this probability.

view this post on Zulip David Egolf (Jan 24 2025 at 23:12):

To think about how to maximize the above, we can equivalently think about how to minimize log(ip(yixi))=ilog(p(yixi))-\log(\prod_i p(y_i|x_i)) = \sum_i -\log(p(y_i|x_i)). We can then consider attempting to minimize this quantity with respect to only some of the xix_i, so that we get a function in the remaining variables.

For example, we might end up with this function:
(x2,x6,x7)minx1,x3,x4,x5(ilog(p(yixi)))(x_2, x_6, x_7) \mapsto \min_{x_1,x_3,x_4,x_5}(\sum_i -\log(p(y_i|x_i)))

view this post on Zulip David Egolf (Jan 24 2025 at 23:17):

We notice that this function involves the "addition" (minimum) and "multiplication" (regular addition) of the tropical rig.

view this post on Zulip David Egolf (Jan 24 2025 at 23:33):

(Note that for this optimization problem to be interesting, we have to restrict the binary sequences we are searching over to the valid codes. The paper achieves this by using parity check functions.

So, we probably instead want to minimize the following with respect to all the xix_i. Namely, we want to minimize ilog(p(yixi))+jχj(x1,x2,,x7)\sum_i -\log(p(y_i|x_i)) + \sum_j \chi_j(x_1, x_2,\dots, x_7). Here each χj\chi_j checks if a given sequence passes a particular parity check, outputting \infty if not and 00 if it is valid. So the χj\chi_j together check if a code is valid. But at any rate, we are still seeing the tropical rig operations show up here.)

view this post on Zulip JR (Jan 25 2025 at 01:54):

Jumping in without checking history: have you seen https://www.pnas.org/doi/10.1073/pnas.0406010101

view this post on Zulip David Egolf (Jan 25 2025 at 03:19):

JR said:

Jumping in without checking history: have you seen https://www.pnas.org/doi/10.1073/pnas.0406010101

Thanks for the link! I have not read that article yet, but I'll plan to take a look at some point! On a related note, I recently found the article "Algebraic, tropical, and fuzzy geometry", which might be relevant too.

view this post on Zulip David Egolf (Feb 04 2025 at 01:03):

One last thought from "The Generalized Distributive Law": if we have multiple random variables and we only observe a value for some of them, then we are left with a "region" that holds all the still-possible values for the random variables.

Moving on from the (enjoyable and interesting) "The Generalized Distributive Law", I see the next comment above talks about the tropical rig in the context of statistical mechanics.

So I will take a look at this post next: The space of physical frameworks (part 1) !

view this post on Zulip David Egolf (Feb 05 2025 at 16:54):

That post describes a way to put a rig structure on (,](-\infty, \infty] given a bijection fβ:(,][0,)f_\beta:(-\infty, \infty] \to [0, \infty). I'm now curious if this works in general!

Let f:RSf:R \to S be a bijection and let SS be a rig. Can we make RR a rig as follows?

view this post on Zulip David Egolf (Feb 05 2025 at 16:55):

view this post on Zulip David Egolf (Feb 05 2025 at 16:57):

Since ff is a bijection, the proposed addition and multiplication are well-defined.

view this post on Zulip David Egolf (Feb 05 2025 at 16:58):

What should the zero element of RR be? We can try 0R=f1(0S)0_R = f^{-1}(0_S). Then r+0R=f1(f(r)+f(0R))=f1(f(r)+0S)=f1(f(r))=rr + 0_R = f^{-1}(f(r) + f(0_R)) = f^{-1}(f(r) + 0_S) = f^{-1}(f(r)) = r as desired.

view this post on Zulip David Egolf (Feb 05 2025 at 17:02):

We can also try 1R=f1(1S)1_R = f^{-1}(1_S). Then r1R=f1(f(r)f(1R))=f1(f(r)1S)=f1(f(r))=rr1_R = f^{-1}(f(r)f(1_R)) = f^{-1}(f(r)1_S) = f^{-1}(f(r)) = r.

view this post on Zulip David Egolf (Feb 05 2025 at 17:04):

Next, we check associativity. (r+r)+r=f1(f(r+r)+f(r))(r+r') + r'' = f^{-1}(f(r+r') + f(r'')). Since we have f(r+r)=f(r)+f(r)f(r+r') = f(r) + f(r'), this becomes f1(f(r)+f(r)+f(r))=f1(f(r)+(f(r)+f(r))=r+(r+r)f^{-1}(f(r) + f(r') + f(r'')) = f^{-1}(f(r) + (f(r') + f(r'')) = r + (r' + r''). I think a similar argument should work for multiplication.

view this post on Zulip David Egolf (Feb 05 2025 at 17:07):

It remains to check distributivity. We have r(r+r)=f1(f(r)f(r+r))=f1(f(r)(f(r)+f(r))=f1(f(r)f(r)+f(r)f(r))=f1(f(rr)+f(rr))=rr+rrr(r' + r'') = f^{-1}(f(r)f(r' + r'')) = f^{-1}(f(r)(f(r') + f(r'')) = f^{-1}(f(r)f(r') + f(r)f(r'')) = f^{-1}(f(rr') + f(rr'')) = rr' + rr''. Similarly, we should have (r+r)r=rr+rr(r' + r'')r = r'r + r''r.

view this post on Zulip David Egolf (Feb 05 2025 at 17:09):

So it looks like we can indeed get a rig structure on RR in this way! That's pretty cool! I'm now wondering if this is just a special case of a more general strategy for inducing algebraic structures.

view this post on Zulip David Egolf (Feb 05 2025 at 17:11):

Hmm, I'm guessing that the rig we get RR is always isomorphic to the rig we started with SS. (I'm guessing that bijective rig morphisms are in fact isomorphisms in the category of rigs). If so, that makes this a bit less exciting.

view this post on Zulip David Egolf (Feb 05 2025 at 17:14):

However, the article manages to take some kind of limit (of the rigs we get in the above way as our bijection changes in a parametrized way) to get a new rig. That's interesting! I would not have expected that a bunch of isomorphic rigs could "approach" a different rig in some sense.

(The new rig we get is the tropical rig!)

view this post on Zulip John Baez (Feb 05 2025 at 17:32):

David Egolf said:

Let f:RSf:R \to S be a bijection and let SS be a rig. Can we make RR a rig as follows?

That works not just for rigs but for anything mathematicians might want to make a set into: groups, fields, topological spaces, quasitriangular Hopf algebras over Fp\mathbb{F}_p, you name it.

Hmm, I'm guessing that the rig we get RR is always isomorphic to the rig we started with, SS.

Yes, that's how it works. We call this "transporting a structure along a bijection".

Most mathematicians take these general facts for granted after having checked them in 5 or 6 examples. But stating these general facts precisely and making them into a theorem is a job for category theorists!

I believe these fact amount to saying that if C\mathsf{C} is any 'reasonable" category of sets with extra structure, the forgetful functor U:CSetU: \mathsf{C} \to \mathsf{Set} is an [[isofibration]]. So then the challenge is to explain why forgetful functors tend to be isofibrations. I could say more....

However, the article manages to take some kind of limit (of the rigs we get in the above way as our bijection changes in a parametrized way) to get a new rig. That's interesting!

Yes, that's the only interesting thing about this business.

I would not have expected that a bunch of isomorphic rigs could "approach" a different rig in some sense.

This type of thing happens a lot, and there's a branch of math called deformation theory that studies it. In special relativity, for example, it doesn't really matter what the speed of light cc is: for any value of cc you get an isomorphic version of special relativity. But when you take the limit cc \to \infty you get a nonisomorphic theory, namely Newtonian mechanics.

It was noticed by Wigner in 1953 (if not sooner by someone else) that new theories of physics often have older theories as limits in this sense. All these theories are actually associated to mathematical structures - for example, most theories have symmetry groups associated to them. So Wigner developed a theory of how to take a limit of a bunch of isomorphic groups and get a different group.

I think my blog post listed a bunch of physical theories that reduce to other theories in some limit.

view this post on Zulip David Egolf (Feb 05 2025 at 19:59):

It's very cool to learn that deformation theory exists. It appears that it's closely related to algebraic geometry, which currently makes it hard for me to understand many of the examples. But the basic idea of something different somehow happening in a limit is pretty magical (in a good way)!

view this post on Zulip David Egolf (Feb 05 2025 at 20:03):

I believe these fact amount to saying that if C\mathsf{C} is any 'reasonable" category of sets with extra structure, the forgetful functor U:CSetU: \mathsf{C} \to \mathsf{Set} is an [[isofibration]]. So then the challenge is to explain why forgetful functors tend to be isofibrations. I could say more....

"Isofibration" is one of those words I've wanted to understand for a while. This might provide a great motivating example! I see the nLab says that an isofibration is roughly a functor F:EBF:E \to B such that every isomorphism ff in BB can be "lifted" to an isomorphism in EE.

In our case, we have a forgetful functor U:CSetU:\mathsf{C} \to \mathsf{Set}, and we roughly want to be able to induce an isomorphism between objects in C\mathsf{C} given an isomorphism between their underlying sets. So if I have a structure cc with underlying set U(c)sU(c) \cong s for some set ss, then we hope we can lift this bijection to an isomorphism in C\mathsf{C}. We want to lift this Set\mathsf{Set}-isomorphism to a C\mathsf{C}-isomorphism between cc and some preimage of ss under UU.

view this post on Zulip David Egolf (Feb 05 2025 at 20:08):

We could ask: when is the forgetful functor U:CTCU:C^T \to C for a monad (T,η,μ)(T, \eta, \mu) an isofibration? Here CTC^T is the Eilenberg-Moore category of the monad involving T:CCT:C \to C, which I think can often be thought of as a category of "algebraic structures".

view this post on Zulip Kevin Carlson (Feb 05 2025 at 20:20):

If you're comfortable with the definition of CT,C^T, you might try to think through this in that generality for yourself!

view this post on Zulip John Baez (Feb 05 2025 at 20:47):

David Egolf said:

It's very cool to learn that deformation theory exists. It appears that it's closely related to algebraic geometry, which currently makes it hard for me to understand many of the examples.

That's just because algebraic geometers like studying things in continously varying families depending on some parameters, like ellipses

x2/a2+y2/b2=1x^2/a^2 + y^2/b^2 = 1

which depend on two parameters a,ba,b. So, given anything they enjoy seeing how they can fit it into a family and 'deform' it, like taking a circle and squashing it into an ellipse. So naturally they infect deformation theory with their jargon and make it hard for other people to understand. :upside_down:

But physicists and other people also use deformation theory, often in blissful ignorance of the power tools of algebraic geometry. So there's stuff to read that doesn't assume algebraic geometry, but usually it's about some particular case of deformation theory.

But the basic idea of something different somehow happening in a limit is pretty magical (in a good way).

Indeed! This is how you get special relativity from general relativity, and classical mechanics from quantum mechanic and (I show in blog series) classical thermodynamics from statistical mechanics.

view this post on Zulip John Baez (Feb 05 2025 at 20:53):

David Egolf said:

We could ask: when is the forgetful functor U:CTCU:C^T \to C for a monad (T,η,μ)(T, \eta, \mu) an isofibration? Here CTC^T is the Eilenberg-Moore category of the monad involving T:CCT:C \to C, which I think can often be thought of as a category of "algebraic structures".

As Kevin hinted, this is an excellent and not very hard question. Maybe the hardest part is remembering the definition of an isofibration! :wink:

view this post on Zulip John Baez (Feb 05 2025 at 21:27):

By the way, I tend to throw you lots of 'exercises' since you seem to have an appetite for them, but if you ever just want to know the answer (like to your question above), just ask. There's a limit to how many exercises anyone can do!

view this post on Zulip David Egolf (Feb 06 2025 at 19:12):

I like exercises/puzzles - they help me understand things! In this case, I'm feeling a bit intimidated by all the fancy words. But it is encouraging to know that this isn't supposed to be too hard... Let me see what I can come up with.

view this post on Zulip John Baez (Feb 06 2025 at 19:31):

I will bite my tongue, then.

view this post on Zulip David Egolf (Feb 06 2025 at 20:00):

I think I figured it out! Let f:U(a)cf:U(a) \cong c in CC be an isomorphism from U(a)U(a) to cc. Here U:CTCU:C^T \to C is our forgetful functor. I believe that UU is always(!) an isofibration.

view this post on Zulip David Egolf (Feb 06 2025 at 20:00):

To show this, we need to find an isomorphism in CTC^T that maps down to ff by UU. Such an isomorphism would be a commutative square of this form, where ff and ff' are both isomorphisms:
commutative square

view this post on Zulip David Egolf (Feb 06 2025 at 20:00):

If we set f=T(f)f' = T(f), we know that ff' is an isomorphism because functors preserve isomorphisms. And we can make the diagram commute by setting b=faT(f)1b = f \circ a \circ T(f)^{-1}. This isomorphism maps down to ff under UU - as UU just grabs the lower part of this diagram - so I think UU is an isofibration!

view this post on Zulip David Egolf (Feb 06 2025 at 20:01):

(In the above example, we might think of a:T(U(a))U(a)a:T(U(a)) \to U(a) as a group, U(a)U(a) as its underlying set, and cc as a set.)

view this post on Zulip John Baez (Feb 06 2025 at 20:03):

Yes, that sounds right! Great! The formula faT(f)1f \circ a \circ T(f)^{-1} generalizes how you might transport any binary operation O:X2XO: X^2 \to X along a bijection f:XXf: X \to X' to get a new binary operation O:X2XO': X'^2 \to X':

O(x,y)=f(O(f1(x),f1(y)) O'(x,y) = f(O(f^{-1}(x), f^{-1}(y))

view this post on Zulip John Baez (Feb 06 2025 at 20:09):

So this shows that when I said any 'reasonable' structure on a set (or indeed an object in any category) can be transported along bijections (or isomorphisms), this includes all structures that come from monads.

There are many 'reasonable' structures on sets that don't come from monads, like 'being a topological space'. So one might want an even more general theorem.

But let me instead describe an 'unreasonable' structure. Define a 'goofy structure' on sets like this:

If the set XX is {1,2,3,4,5,6,7,8}\{1,2,3,4,5,6,7,8\}, a goofy structure on XX is a way of making it into a group.

If the set XX is anything not equal to {1,2,3,4,5,6,7,8}\{1,2,3,4,5,6,7,8\}, a goofy structure is a way of making it into a field.

view this post on Zulip David Egolf (Feb 07 2025 at 18:30):

The "goofy structure" I suppose is "unreasonable" because the goofy structures we can put on a set can be essentially different even for isomorphic sets. So this concept is violating the principle of equivalence.

view this post on Zulip David Egolf (Feb 07 2025 at 18:31):

One might wonder how one can detect if a notion of structure is "reasonable" in the sense that it doesn't lead to essentially different structures on isomorphic objects.

view this post on Zulip Kevin Carlson (Feb 07 2025 at 18:33):

Well...how about, if the forgetful functor from the category of those structures is an isofibration? :grinning:

view this post on Zulip David Egolf (Feb 07 2025 at 18:34):

Yes! I was just about to propose something like that! That seems pretty reasonable to me.

view this post on Zulip Kevin Carlson (Feb 07 2025 at 18:35):

I'd say that's exactly the conceptual content of the notion of isofibration, so I think we're on the same page.

view this post on Zulip David Egolf (Feb 07 2025 at 18:36):

We could then say that a category of (reasonable) structures on the objects in a category CC is a category DD equipped with an isofibration U:DCU:D \to C.

view this post on Zulip Kevin Carlson (Feb 07 2025 at 18:37):

I might just complicate that with the "stuff, structure, property" triumvirate--if UU is not faithful, I probably wouldn't want to call DD a category of structures on C.C. But (one wonders!) surely there are non-faithful isofibrations.

view this post on Zulip Mike Shulman (Feb 07 2025 at 19:24):

Yes, e.g. the projection Set×SetSet\rm Set\times Set \to Set (either one) is an isofibration.

view this post on Zulip Kevin Carlson (Feb 07 2025 at 19:25):

Of course, that was just meant to be a temptation to David to think about it.

view this post on Zulip Kevin Carlson (Feb 07 2025 at 19:26):

Indeed, every functor is equivalent (in an appropriate sense) to an isofibration; one might enjoy learning about the canonical model structure on the category of small categories in this light.

view this post on Zulip Mike Shulman (Feb 07 2025 at 19:26):

Oh, sorry.

view this post on Zulip Kevin Carlson (Feb 07 2025 at 19:26):

No worries, my fault for being coy.

view this post on Zulip John Baez (Feb 08 2025 at 00:20):

David Egolf said:

The "goofy structure" I suppose is "unreasonable" because the goofy structures we can put on a set can be essentially different even for isomorphic sets. So this concept is violating the principle of equivalence.

Right! But here's an interesting twist: the concept of 'isofibration' itself violates the principle of equivalence. You can have two functors

F,G:CD F, G : \mathsf{C} \to \mathsf{D}

that are naturally isomorphic, with one being an isofibration and the other not. Let me see if I can make up an easy example. The identity functor

1:FinSetFinSet 1 : \mathsf{FinSet} \to \mathsf{FinSet}

is an isofibration (since all identity functors are). But it's naturally isomorphic to a functor that sends all nn-element sets to the same nn-element set. (Yet another puzzle: see why?) This latter functor is clearly not an isofibration.

view this post on Zulip John Baez (Feb 08 2025 at 00:21):

So if you call the violation of the principle of equivalence "evil", and you think being an isofibration is not evil, i.e. "good", you're in the funny situation that it's evil to say a functor is good!

view this post on Zulip John Baez (Feb 08 2025 at 00:37):

In fact, every functor can be 'repaired' to get an isofibration, without changing anything important.

More precisely, given a functor

F:CD F : C \to D

we can make CC into a subcategory of a larger category CC', where the inclusion

i:CC i : C \to C'

is an equivalence, and find an isofibration

F:CD F': C' \to D

such that

FFi F \cong F' \circ i

view this post on Zulip John Baez (Feb 08 2025 at 00:40):

The idea is that we puff up the category CC to get a bigger but equivalent category CC', which allows us to replace the functor F:CDF: C \to D by a functor F:CDF': C' \to D that's an isofibration, basically because CC' has enough isomorphisms to do all the lifting we want in the definition of 'isofibration'.

view this post on Zulip David Egolf (Feb 09 2025 at 22:53):

John Baez said:

But here's an interesting twist: the concept of 'isofibration' itself violates the principle of equivalence. You can have two functors

F,G:CD F, G : \mathsf{C} \to \mathsf{D}

that are naturally isomorphic, with one being an isofibration and the other not. Let me see if I can make up an easy example. The identity functor

1:FinSetFinSet 1 : \mathsf{FinSet} \to \mathsf{FinSet}

is an isofibration (since all identity functors are). But it's naturally isomorphic to a functor that sends all nn-element sets to the same nn-element set. (Yet another puzzle: see why?) This latter functor is clearly not an isofibration.

That makes sense! (And I think I was able to work out the details of checking that we have a natural isomorphism between two such functors.)

view this post on Zulip David Egolf (Feb 09 2025 at 23:02):

John Baez said:

The idea is that we puff up the category CC to get a bigger but equivalent category CC', which allows us to replace the functor F:CDF: C \to D by a functor F:CDF': C' \to D that's an isofibration, basically because CC' has enough isomorphisms to do all the lifting we want in the definition of 'isofibration'.

Interesting! Intuitively, I guess the repair process creates a version of FF that better respects the principle of equivalence, in terms of what is in its image.

view this post on Zulip John Baez (Feb 10 2025 at 00:25):

I think I was able to work out the details of checking that we have a natural isomorphism between two such functors.

Excellent! It's closely related to the idea of a [[skeleton]]: every category C\mathsf{C} is equivalent to a category C0\mathsf{C}_0 where isomorphic objects are equal, and you can set it up so that the functor C0C\mathsf{C}_0 \to \mathsf{C} in this equivalence is the inclusion of a subcategory, gotten by choosing one object from each isomorphism class of objects in C\mathsf{C}.

When we do this trick for C=FinSet\mathsf{C} = \mathsf{FinSet}, we usually choose the objects of C0\mathsf{C}_0 to be the ordinals - our 'favorite' nn-element sets.