Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: learning: questions

Topic: "probabilistic" algebraic geometry?


view this post on Zulip David Egolf (Dec 13 2024 at 01:56):

I've been working on learning some of the introductory ideas of algebraic geometry. To my understanding, in this context there is a duality between algebra and geometry:

In the extreme, there is a correspondence between maximal ideals (large and algebraic) and "smallest points" (small and geometric).

Intuitively, this seems like a manifestation of the duality between information and ambiguity. The more one knows about something, or the more that one observes something, the less ambiguity remains regarding what that something can possibly be. So if I pick some mystery geometric region and gradually tell you more and more polynomials that vanish on that region, then you will be able to gradually sharpen your estimate of my chosen mystery region.

Because I care about medical imaging, I care about this duality between information and ambiguity in the case where the information provided is imperfect. In this case, one's estimates of what is present are also imperfect. Motivated by this, I'm curious if algebraic geometry has been developed in the direction of exploring the duality between algebraic and geometric structures in the case where the algebra is "probabilistic" in some sense.

As an initial guess, perhaps one could try to define a probabilistic notion of a polynomial. Instead of being able to assess if a polynomial is zero or not at a point in question, we would instead be given the ability to assess the probability that a "probabilistic polynomial" is zero at some point in question.

To summarize, I have two specific questions:

view this post on Zulip John Baez (Dec 13 2024 at 02:23):

I'd be really interested if anyone has done something like this, but I feel you're stretching algebraic geometry in ways it doesn't want to go. Since you mention "information" and "probability", I'd be tempted instead to advertise information theory as the power tool for questions like yours.

Given two random variables XX and YY, we can mathematically define three quantities:

Then there's an equation

H(XY)=H(X)I(X;Y) H(X|Y) = H(X) - I(X;Y)

This says how much learning the value of YY reduces how much you don't know concerning the value of XX, on average. That's how I'd be tempted to formalize the "duality" you are sensing.

It takes time to understand these concepts, but anyone who wants to understand the universe needs to learn about them - and they are very general and beautiful. Luckily they are explained on Wikipedia (at the links), and even better here:

The equation I wrote down is equation (2.39) in this book.

view this post on Zulip David Egolf (Dec 13 2024 at 02:34):

Thanks for reminding me of those concepts! Back in engineering grad school, I spent a little bit of time using these concepts to try and think about ultrasound imaging. I even spent some time with that book by Cover and Thomas. Ultimately other work was progressing towards publication more quickly and so we abandoned the idea. But it was interesting stuff! It might be fun to go back to it, now that I know some more math...

(We were specifically playing around with these concepts in simulation, trying to see if it was possible to optimize the length of an ultrasound pulse in 1D, given some statistical assumptions about the medium we're transmitting it through. (The longer the transmitted pulse the stronger your resulting observed signal and so the more robust you are to noise, but also the more autocorrelation crosstalk you get). I forget exactly what we were optimizing though... presumably we were trying to minimize the conditional entropy regarding the target given our resulting observations!?)

view this post on Zulip John Baez (Dec 13 2024 at 02:44):

Btw, if you ever want to see how entropy emerges naturally from category theory, there's an explanation in Tom's free book:

And I have my own free book on entropy in physics. There's remarkably little overlap in these three books, and a lot of basic stuff not covered by any of them, but Cover and Thomas is the one you want for the issues I was just mentioning.

view this post on Zulip Joe Moeller (Dec 13 2024 at 21:59):

I'll mention algebraic statistics without thinking too hard if it's truly an answer to the question.

view this post on Zulip John Baez (Dec 13 2024 at 22:13):

The section "Recent work using commutative algebra and algebraic geometry" might help @David Egolf come up with a good answer to his question, but I'm afraid it will take work: it's certainly not handing out anything on a silver platter.

view this post on Zulip Simon Burton (Dec 14 2024 at 13:11):

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

view this post on Zulip David Egolf (Dec 15 2024 at 17:55):

Joe Moeller said:

I'll mention algebraic statistics without thinking too hard if it's truly an answer to the question.

Thanks for linking that page! On that page, there are several things that look interesting to me. For example, from that article I found a paper by David Mumford called "Pattern Theory: The Mathematics of Perception".

view this post on Zulip David Egolf (Dec 15 2024 at 18:06):

The section "Recent work using commutative algebra and algebraic geometry" is also interesting, and probably relevant to what I asked about in this thread.

Roughly, the idea sketched in that section appears to me to be this: If we have a discrete probability distribution determined by some unknown parameter, then at least in some cases the individual probability values in each possible distribution are those that satisfy some polynomial equation. So we can then study our collection of possible probability distributions by studying where a certain polynomial goes to zero - which can presumably be done using algebraic geometry.

view this post on Zulip David Egolf (Dec 15 2024 at 18:07):

In this approach, it appears that we construct (standard) polynomials of interest from some probabilistic thing of interest. That's not a perfect match with what I was wondering about in this thread, but it's still interesting.

view this post on Zulip David Egolf (Dec 15 2024 at 18:35):

Simon Burton said:

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

This also looks quite interesting! I'll plan to leave a lengthier comment here once I've spent some time looking at this paper.

view this post on Zulip John Baez (Dec 15 2024 at 22:56):

A bunch of people have written about how the tropical rig is the T0T \to 0 limit of a rig that's important in statistical mechanics at temperature TT. I first learned about this from James Dolan and I could easily bloviate about it at length. I wrote a short version here:

view this post on Zulip John Baez (Dec 15 2024 at 23:12):

But one really broad lesson is that a lot of algebraic geometry can be generalized from commutative rings to commutative rigs and even more general commutative Lawvere theories - Lawvere theories where all the operations "commute" in a suitable sense. (It takes a while to figure out what it means for a 3-ary operation and a 5-ary operation to "commute", but the answer is visually satisfying.) Nikolai Durov has a mammoth book generalizing lots of algebraic geometry (e.g. schemes) from commutative rings to commutative Lawvere theories, but part of why this book is so huge is that he assumes the reader knows little of category theory, and proceeds at a slow pace.

view this post on Zulip Rémy Tuyéras (Dec 16 2024 at 13:11):

In cryptography, polynomials are used to encrypt data. More specifically, cryptographers use ideals to introduce noise (randomness) into the encrypted data. To draw an analogy with your problem, @David Egolf, encrypted data can be compared to raw data awaiting analysis.

Below, I will denote by mm some valuable information that one wants to discover. In the context of encryption, mm is a message that an attacker wants to discover, but maybe in your case @David Egolf, mm is some information that you want to discover through some analysis (just to give some ideas here).

Now, the idea is to have a polynomial eme_m that encrypts/hides the information mm. In general, this polynomial will be considered modulo a certain polynomial qq such that it makes sense to denote em:=em(modq)e_m' := e_m \,(\mathsf{mod}\,q). An important detail to keep in mind here is that the inequality ememe_m' \neq e_m will likely hold with high probability. However, we assume that mm is small enough to have m=m(modq)m=m \,(\mathsf{mod}\,q).

The polynomial eme_m is usually encoded as a sum am+ba_m+b such that m=am(modq)m = a_m \,(\mathsf{mod}\,q). This means that if you know bb, then you can discover m=(emb)(modq)m = (e_m-b) \,(\mathsf{mod}\,q) using the subtraction emb(modq)e_m'-b \,(\mathsf{mod}\,q). In other words, the quantity bb will likely be the key focus of your research analysis in order to learn mm.

Still, encryption doesn’t stop there. The encryption eme_m is usually hidden by an extra polynomial uu (this is the noise), which would belong to an ideal of the form (p)(p) where pp is some other polynomial. In other words, we have em=am+b+ue_m = a_m+b+u where u(modp)=0u \,(\mathsf{mod}\,p) = 0.

In practice, we also assume that mm is small enough such that m=m(modp)m = m \,(\mathsf{mod}\,p).

Depending on the size of the noise uu, two scenarios can occur:

view this post on Zulip Rémy Tuyéras (Dec 16 2024 at 13:24):

A few remarks:

view this post on Zulip JR (Dec 16 2024 at 15:02):

Not exactly on topic but perhaps worth noting that multilayer perceptrons with ReLU activations instantiate tropical polynomial maps. Meanwhile, transformer blocks at zero temperature are also (essentially?) algebraic maps. So large language models and their ilk at zero temperature are distributions formed from tropical polynomials.

view this post on Zulip David Egolf (Dec 16 2024 at 18:23):

Thanks everyone! All these comments look interesting! It will, however, take me some time to get to them all.

view this post on Zulip David Egolf (Dec 16 2024 at 18:26):

Simon Burton said:

From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...

I'm still working on reading this paper, but I already learned something pretty interesting from it! This paper tells me that one can form a "commutative semiring" (which we might also call a "commutative rig") from any distributive lattice. The nLab tells me that any Heyting algebra forms a distributive lattice. And "Sheaves in Geometry and Logic" tells me that if BB is any object in a (elementary?) topos E\mathcal{E} then the subobjects of BB form a Heyting algebra.

view this post on Zulip David Egolf (Dec 16 2024 at 18:28):

So, any object BB in any (elementary?) topos comes with an associated commutative rig, where the elements in the rig are the subobjects of BB!

view this post on Zulip John Baez (Dec 16 2024 at 18:29):

Yes, that's fun. I believe a distributive lattice is the same as a commutative rig where x + x = x and x ×\times x = x for all x.

view this post on Zulip John Baez (Dec 16 2024 at 18:32):

The idea is that "or" is like +, but it obeys "x or x = x", and similarly "and" is like ×\times, but it obeys "x and x = x".

view this post on Zulip David Egolf (Dec 16 2024 at 18:52):

That is very cool! So presumably we can do some amount of logic with the subobjects of an object of an elementary topos.

view this post on Zulip David Egolf (Dec 16 2024 at 18:54):

Wrapping back around to algebraic geometry, I guess we use this "logic" when talking about subsets of some set that vanish under different polynomial ideals.

view this post on Zulip David Egolf (Dec 16 2024 at 18:56):

In that setting, we have some nice relationships. For example, V(a)V(b)=V(a+b)V(a) \cap V(b) = V(a + b) where V(a)V(a) is the subset of some set XX where some ideal aa of polynomials on XX vanishes, and a+ba+b is the ideal generated by aa and bb.

view this post on Zulip David Egolf (Dec 16 2024 at 18:58):

I wonder if one could try and develop this setup using subobjects of some object in a topos different from Set\mathsf{Set}. So then we would presumably want this: V(a)V(b)=V(a+b)V(a) \land V(b) = V(a+b), where \land is the product in the lattice of subobjects of some object XX in some topos E\mathcal{E}.

view this post on Zulip David Egolf (Dec 16 2024 at 19:01):

For example, one might do this if one wished to study "curves" where each "curve" is a subgraph or a subpresheaf instead of a subset.

(This idea catches my attention in part because I have the idea that it may be interesting to represent "knowledge states" in the context of imaging as presheaves.)

view this post on Zulip David Egolf (Dec 16 2024 at 19:15):

Anyways, I should probably spend a lot more time on standard algebraic geometry before trying out something like this!

view this post on Zulip Madeleine Birchfield (Dec 17 2024 at 00:06):

John Baez said:

Yes, that's fun. I believe a distributive lattice is the same as a commutative rig where x + x = x and x ×\times x = x for all x.

John Baez said:

The idea is that "or" is like +, but it obeys "x or x = x", and similarly "and" is like ×\times, but it obeys "x and x = x".

Discussion on whether this is true or not has been moved to #learning: questions > commutative rigs, distributive lattices, etc.

However I think people in that thread have come up with a commutative rig where x+x=xx + x = x and x×x=xx \times x = x for all xx but which is not a distributive lattice. Apparently one needs to also have the condition that 11 is absorbing for ++, which then automatically implies that x+x=xx + x = x.

view this post on Zulip David Egolf (Dec 17 2024 at 19:28):

I'm still working on reading the above paper "The Generalized Distributive Law". It's slow going because I keep getting interested in spin-off thoughts :laughing:.

So far I don't yet see how the tropical rig is important in this paper - I'm still looking forward to getting to that.

view this post on Zulip David Egolf (Dec 17 2024 at 19:29):

A couple interesting tidbits from the paper so far, though:

view this post on Zulip David Egolf (Dec 17 2024 at 19:30):

(I suppose a similar thing potentially holds true for any equation that holds in an algebraic structure).

view this post on Zulip David Egolf (Dec 17 2024 at 19:31):

view this post on Zulip John Baez (Dec 17 2024 at 19:35):

Re distributive laws between monads: they are directed, so for example if you have a rig freely generated by some elements xix_i you can write any element of the rig as sum of products of the xix_i, but not as a product of sums of the xix_i.

view this post on Zulip John Baez (Dec 17 2024 at 19:40):

In other words, "expanding" a product of sums is a systematic process, but "simplifying" a sum of products is not so straightforward.

view this post on Zulip David Egolf (Dec 18 2024 at 19:26):

John Baez said:

In other words, "expanding" a product of sums is a systematic process, but "simplifying" a sum of products is not so straightforward.

That makes sense! With regards to saving on computation, we can imagine that a(b+c+d)a(b+c+d) is faster to compute (due to having fewer multiplications) compared to ab+ac+adab+ac+ad. And it is very easy to rewrite a(b+c+d)a(b+c+d) as ab+ac+adab+ac+ad, but potentially much trickier to rewrite ab+ac+adab+ac+ad as a(b+c+d)a(b+c+d). So if we want to save on computation, we need to work for it!

view this post on Zulip David Egolf (Jan 05 2025 at 20:27):

I want to work a bit on this thread again. I'm currently working towards learning how the tropical rig can be used in statistical interference, by reading the paper "The Generalized Distributive Law" linked above. (And then I want to take a look at some of the other interesting comments made above! There's so much to learn!)

view this post on Zulip David Egolf (Jan 05 2025 at 21:13):

I made a little more progress in reading that paper, and thought this was pretty interesting. The authors introduce the "MPF problem", which refers to a situation where one has a product of functions and then wishes to compute a "marginalized" version of it, by adding up all the values that function attains as some input ranges over all possible values.

view this post on Zulip David Egolf (Jan 05 2025 at 21:14):

This is interesting because various famous and useful things have a related MPF problem. For examples, the paper gives the Hadamard transform and the Fourier transform (over any finite abelian group). But I think that any (finite) convolution or cross-correlation will also give an example!

view this post on Zulip David Egolf (Jan 05 2025 at 21:15):

(Cross-correlation in particular is of interest to me, because it can play an important role in forming ultrasound images from "coded" transmissions. Basically, we send out a particular pattern of ultrasound and then look for that pattern in the echoes we get back using a cross-correlation.)