You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
I've been working on learning some of the introductory ideas of algebraic geometry. To my understanding, in this context there is a duality between algebra and geometry:
In the extreme, there is a correspondence between maximal ideals (large and algebraic) and "smallest points" (small and geometric).
Intuitively, this seems like a manifestation of the duality between information and ambiguity. The more one knows about something, or the more that one observes something, the less ambiguity remains regarding what that something can possibly be. So if I pick some mystery geometric region and gradually tell you more and more polynomials that vanish on that region, then you will be able to gradually sharpen your estimate of my chosen mystery region.
Because I care about medical imaging, I care about this duality between information and ambiguity in the case where the information provided is imperfect. In this case, one's estimates of what is present are also imperfect. Motivated by this, I'm curious if algebraic geometry has been developed in the direction of exploring the duality between algebraic and geometric structures in the case where the algebra is "probabilistic" in some sense.
As an initial guess, perhaps one could try to define a probabilistic notion of a polynomial. Instead of being able to assess if a polynomial is zero or not at a point in question, we would instead be given the ability to assess the probability that a "probabilistic polynomial" is zero at some point in question.
To summarize, I have two specific questions:
I'd be really interested if anyone has done something like this, but I feel you're stretching algebraic geometry in ways it doesn't want to go. Since you mention "information" and "probability", I'd be tempted instead to advertise information theory as the power tool for questions like yours.
Given two random variables and , we can mathematically define three quantities:
Then there's an equation
This says how much learning the value of reduces how much you don't know concerning the value of , on average. That's how I'd be tempted to formalize the "duality" you are sensing.
It takes time to understand these concepts, but anyone who wants to understand the universe needs to learn about them - and they are very general and beautiful. Luckily they are explained on Wikipedia (at the links), and even better here:
The equation I wrote down is equation (2.39) in this book.
Thanks for reminding me of those concepts! Back in engineering grad school, I spent a little bit of time using these concepts to try and think about ultrasound imaging. I even spent some time with that book by Cover and Thomas. Ultimately other work was progressing towards publication more quickly and so we abandoned the idea. But it was interesting stuff! It might be fun to go back to it, now that I know some more math...
(We were specifically playing around with these concepts in simulation, trying to see if it was possible to optimize the length of an ultrasound pulse in 1D, given some statistical assumptions about the medium we're transmitting it through. (The longer the transmitted pulse the stronger your resulting observed signal and so the more robust you are to noise, but also the more autocorrelation crosstalk you get). I forget exactly what we were optimizing though... presumably we were trying to minimize the conditional entropy regarding the target given our resulting observations!?)
Btw, if you ever want to see how entropy emerges naturally from category theory, there's an explanation in Tom's free book:
And I have my own free book on entropy in physics. There's remarkably little overlap in these three books, and a lot of basic stuff not covered by any of them, but Cover and Thomas is the one you want for the issues I was just mentioning.
I'll mention algebraic statistics without thinking too hard if it's truly an answer to the question.
The section "Recent work using commutative algebra and algebraic geometry" might help @David Egolf come up with a good answer to his question, but I'm afraid it will take work: it's certainly not handing out anything on a silver platter.
From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...
Joe Moeller said:
I'll mention algebraic statistics without thinking too hard if it's truly an answer to the question.
Thanks for linking that page! On that page, there are several things that look interesting to me. For example, from that article I found a paper by David Mumford called "Pattern Theory: The Mathematics of Perception".
The section "Recent work using commutative algebra and algebraic geometry" is also interesting, and probably relevant to what I asked about in this thread.
Roughly, the idea sketched in that section appears to me to be this: If we have a discrete probability distribution determined by some unknown parameter, then at least in some cases the individual probability values in each possible distribution are those that satisfy some polynomial equation. So we can then study our collection of possible probability distributions by studying where a certain polynomial goes to zero - which can presumably be done using algebraic geometry.
In this approach, it appears that we construct (standard) polynomials of interest from some probabilistic thing of interest. That's not a perfect match with what I was wondering about in this thread, but it's still interesting.
Simon Burton said:
From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...
This also looks quite interesting! I'll plan to leave a lengthier comment here once I've spent some time looking at this paper.
A bunch of people have written about how the tropical rig is the limit of a rig that's important in statistical mechanics at temperature . I first learned about this from James Dolan and I could easily bloviate about it at length. I wrote a short version here:
But one really broad lesson is that a lot of algebraic geometry can be generalized from commutative rings to commutative rigs and even more general commutative Lawvere theories - Lawvere theories where all the operations "commute" in a suitable sense. (It takes a while to figure out what it means for a 3-ary operation and a 5-ary operation to "commute", but the answer is visually satisfying.) Nikolai Durov has a mammoth book generalizing lots of algebraic geometry (e.g. schemes) from commutative rings to commutative Lawvere theories, but part of why this book is so huge is that he assumes the reader knows little of category theory, and proceeds at a slow pace.
In cryptography, polynomials are used to encrypt data. More specifically, cryptographers use ideals to introduce noise (randomness) into the encrypted data. To draw an analogy with your problem, @David Egolf, encrypted data can be compared to raw data awaiting analysis.
Below, I will denote by some valuable information that one wants to discover. In the context of encryption, is a message that an attacker wants to discover, but maybe in your case @David Egolf, is some information that you want to discover through some analysis (just to give some ideas here).
Now, the idea is to have a polynomial that encrypts/hides the information . In general, this polynomial will be considered modulo a certain polynomial such that it makes sense to denote . An important detail to keep in mind here is that the inequality will likely hold with high probability. However, we assume that is small enough to have .
The polynomial is usually encoded as a sum such that . This means that if you know , then you can discover using the subtraction . In other words, the quantity will likely be the key focus of your research analysis in order to learn .
Still, encryption doesn’t stop there. The encryption is usually hidden by an extra polynomial (this is the noise), which would belong to an ideal of the form where is some other polynomial. In other words, we have where .
In practice, we also assume that is small enough such that .
Depending on the size of the noise , two scenarios can occur:
A few remarks:
Not exactly on topic but perhaps worth noting that multilayer perceptrons with ReLU activations instantiate tropical polynomial maps. Meanwhile, transformer blocks at zero temperature are also (essentially?) algebraic maps. So large language models and their ilk at zero temperature are distributions formed from tropical polynomials.
Thanks everyone! All these comments look interesting! It will, however, take me some time to get to them all.
Simon Burton said:
From the nlab: "Tropical geometry is often thought of as algebraic geometry over the tropical semiring." This tropical semiring is the min-plus rig and is used in statistical inference, for example here: The Generalized Distributive Law...
I'm still working on reading this paper, but I already learned something pretty interesting from it! This paper tells me that one can form a "commutative semiring" (which we might also call a "commutative rig") from any distributive lattice. The nLab tells me that any Heyting algebra forms a distributive lattice. And "Sheaves in Geometry and Logic" tells me that if is any object in a (elementary?) topos then the subobjects of form a Heyting algebra.
So, any object in any (elementary?) topos comes with an associated commutative rig, where the elements in the rig are the subobjects of !
Yes, that's fun. I believe a distributive lattice is the same as a commutative rig where x + x = x and x x = x for all x.
The idea is that "or" is like +, but it obeys "x or x = x", and similarly "and" is like , but it obeys "x and x = x".
That is very cool! So presumably we can do some amount of logic with the subobjects of an object of an elementary topos.
Wrapping back around to algebraic geometry, I guess we use this "logic" when talking about subsets of some set that vanish under different polynomial ideals.
In that setting, we have some nice relationships. For example, where is the subset of some set where some ideal of polynomials on vanishes, and is the ideal generated by and .
I wonder if one could try and develop this setup using subobjects of some object in a topos different from . So then we would presumably want this: , where is the product in the lattice of subobjects of some object in some topos .
For example, one might do this if one wished to study "curves" where each "curve" is a subgraph or a subpresheaf instead of a subset.
(This idea catches my attention in part because I have the idea that it may be interesting to represent "knowledge states" in the context of imaging as presheaves.)
Anyways, I should probably spend a lot more time on standard algebraic geometry before trying out something like this!
John Baez said:
Yes, that's fun. I believe a distributive lattice is the same as a commutative rig where x + x = x and x x = x for all x.
John Baez said:
The idea is that "or" is like +, but it obeys "x or x = x", and similarly "and" is like , but it obeys "x and x = x".
Discussion on whether this is true or not has been moved to #learning: questions > commutative rigs, distributive lattices, etc.
However I think people in that thread have come up with a commutative rig where and for all but which is not a distributive lattice. Apparently one needs to also have the condition that is absorbing for , which then automatically implies that .
I'm still working on reading the above paper "The Generalized Distributive Law". It's slow going because I keep getting interested in spin-off thoughts :laughing:.
So far I don't yet see how the tropical rig is important in this paper - I'm still looking forward to getting to that.
A couple interesting tidbits from the paper so far, though:
(I suppose a similar thing potentially holds true for any equation that holds in an algebraic structure).
Re distributive laws between monads: they are directed, so for example if you have a rig freely generated by some elements you can write any element of the rig as sum of products of the , but not as a product of sums of the .
In other words, "expanding" a product of sums is a systematic process, but "simplifying" a sum of products is not so straightforward.
John Baez said:
In other words, "expanding" a product of sums is a systematic process, but "simplifying" a sum of products is not so straightforward.
That makes sense! With regards to saving on computation, we can imagine that is faster to compute (due to having fewer multiplications) compared to . And it is very easy to rewrite as , but potentially much trickier to rewrite as . So if we want to save on computation, we need to work for it!
I want to work a bit on this thread again. I'm currently working towards learning how the tropical rig can be used in statistical interference, by reading the paper "The Generalized Distributive Law" linked above. (And then I want to take a look at some of the other interesting comments made above! There's so much to learn!)
I made a little more progress in reading that paper, and thought this was pretty interesting. The authors introduce the "MPF problem", which refers to a situation where one has a product of functions and then wishes to compute a "marginalized" version of it, by adding up all the values that function attains as some input ranges over all possible values.
This is interesting because various famous and useful things have a related MPF problem. For examples, the paper gives the Hadamard transform and the Fourier transform (over any finite abelian group). But I think that any (finite) convolution or cross-correlation will also give an example!
(Cross-correlation in particular is of interest to me, because it can play an important role in forming ultrasound images from "coded" transmissions. Basically, we send out a particular pattern of ultrasound and then look for that pattern in the echoes we get back using a cross-correlation.)