You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Is there any theory that deals with "the normal distribution with variance , such that "? Equivalently stated, is there a model of SDG with a good Giry monad on it?
You're making me think of nonstandard analysis, where we can have a step function of width and height where is a 'hyperinteger', that is, an integer that's bigger than any standard integer. Its integral is so it serves as a version of the Dirac delta. But I guess we can equally well define, in nonstandard analysis, a Gaussian
where the variance is a nonstandard integer (or indeed any hyperreal that's bigger than all reals).
This should have integral 1 if I got the formula right.
Given how nonstandard analysis is set up, there should be a whole nonstandard theory of the Giry monad that closely mimics the usual theory.
But the difference between what you're suggesting and this is that even if is a hyperinteger so is infinitesimal, is not zero: it's just a smaller infinitesimal!
Owen Lynch said:
Is there any theory that deals with "the normal distribution with variance $d$, such that $d^2 = 0$"? Equivalently stated, is there a model of SDG with a good Giry monad on it?
As for the second question, there is the "distribution monad" on models of SDG, considered by Kock I think. It should be mentioned somewhere in "Commutative Monads as a Theory of Distributions" if I recall the name of that paper correctly. This also includes distributions like the derivative of the delta distribution etc. But you could then restrict to posit normalized distributions if that bothers you. Maybe that’s a starting point for what you’re looking for?
Owen Lynch said:
Is there any theory that deals with "the normal distribution with variance , such that "? Equivalently stated, is there a model of SDG with a good Giry monad on it?
Maybe I am not understanding the question, but Gaussian distributions can be defined with respect to a positive semidefinite covariance matrix. In this case, then you can have no variance/covariance in some variables, where of course . If the covariance matrix not strictly positive definite then there will be no probability density function, but it still has a characteristic function.
https://arxiv.org/pdf/math/0407242.pdf I'm aware (and interested, from afar) of this paper which studies the normal distribution in a Cahiers topos, with the intent of resolving the scenario where the heat kernel is defined as a case statement: the centered normal distribution or if
Just a newbie question: SDG stands for "synthetic differential geometry"?
Cole Comfort said:
Owen Lynch said:
Is there any theory that deals with "the normal distribution with variance , such that "? Equivalently stated, is there a model of SDG with a good Giry monad on it?
Maybe I am not understanding the question, but Gaussian distributions can be defined with respect to a positive semidefinite covariance matrix. In this case, then you can have no variance/covariance in some variables, where of course . If the covariance matrix not strictly positive definite then there will be no probability density function, but it still has a characteristic function.
Then if you are happy to give up probability density functions, you can just work with characteristic functions. And you can push forward the characteristic functions along affine transformations and so on. It is even well-enough behaved that you can compose the characteristic functions of extended Gaussian distributions (Gaussian distributions on linear subspaces) relationally, regarded as a subcategory of -affine relations, using complex analysis instead of nonstandard analysis.
Following work in geometric quantization, this is how me and my coauthors managed to give a categorical semantics for "infinitely squeezed" Dirac deltas in Gaussian quantum mechanics, avoiding Schwartz distributions and nonstandard analysis. But at the same time, there have been nonstandard attempts to give semantics to the same end. The connection between the complex vs nonstandard approach is very mysterious to me, and I wonder if there is some formal correspondance between both.
Cole Comfort said:
Owen Lynch said:
Is there any theory that deals with "the normal distribution with variance , such that "? Equivalently stated, is there a model of SDG with a good Giry monad on it?
Maybe I am not understanding the question, but Gaussian distributions can be defined with respect to a positive semidefinite covariance matrix. In this case, then you can have no variance/covariance in some variables, where of course . If the covariance matrix not strictly positive definite then there will be no probability density function, but it still has a characteristic function.
Since Owen mentioned SDG (synthetic differential geometry), where you have infinitesimals that are not zero but square to zero, I assumed he was asking if in SDG you can have a Gaussian whose variance is not zero but square to zero. But he left out the italicized phrase.
Since I don't see how to put an SDG-style infinitesimal into the formula
I switched the topic slightly to nonstandard analysis, where you can do this.
Since @Cole Comfort mentioned categories of Gaussian probability kernels , which allows zero variance, I thought to mention that we can also regard all Markov categories as embedded in Kleisli categories for commutative monads on cartesian categories. So we can embed in a Kleisli category.
The canonical construction would give us a "GiryGauss" monad on the category of presheaves on a category of affine maps between Euclidean spaces. I'm not sure whether anyone wrote this down (@Dario Stein?). Not SDG but getting a bit closer in spirit.
A bit of this is in my thesis; I also did some (unpublished) work with Alex Simpson confirming that Gaussians can be treated elegantly as sheaves on affine maps as you say, or over the category of Euclidean co-isometries (i.e. matrices such that ). That makes them a bit like a continuous version of nominal sets ;) I'm not very familiar with SDG however -- what is required for that?
@Dario Stein
Actual sheaves, not just presheaves?
Sheaves for the atomic topology, much like in this work of Alex Simpson, or his upcoming LICS paper. This is another aspect that is reminiscent of nominal sets.
Dario Stein said:
A bit of this is in my thesis; I also did some (unpublished) work with Alex Simpson confirming that Gaussians can be treated elegantly as sheaves on affine maps as you say, or over the category of Euclidean co-isometries (i.e. matrices such that ). That makes them a bit like a continuous version of nominal sets ;) I'm not very familiar with SDG however -- what is required for that?
I look forward to reading more about this when it is published. Do you know if Gaussian relations can also be constructed in a similar way?
John Baez said:
Since I don't see how to put an SDG-style infinitesimal into the formula
I switched the topic slightly to nonstandard analysis, where you can do this.
One issue is probably that you don't know that , and also I think it is false that is invertible. Leaving aside the issue of taking the square root! (one might conceivably work around this by considering something like a higher-order infinitesimal , so that is not zero, and consider , but we are likely already hosed by the other facts I mentioned)
There are models of SDG with invertible infinitesimals, too, but I guess this is changing the question again? ( Whose motivation was what precisely other than "cool, infinitesimals"? ;) )
I think the motivation is that Owen wants to do a synthetic version of stochastic differential equations.
Yeah, you might think of it not as a pdf, but rather as "the infinitesimal timestep for an SDE"
The cool thing about Brownian motion is how in a time step the particle moves a random amount whose standard deviation is , which suggests a formalism where you have square roots of infinitesimals, which are bigger than the infinitesimals themselves.
This is related to how Brownian paths are fractal.
Cole Comfort said:
Following work in geometric quantization, this is how me and my coauthors managed to give a categorical semantics for "infinitely squeezed" states such as Dirac deltas in Gaussian quantum mechanics, avoiding Schwartz distributions and nonstandard analysis. But at the same time, there have been nonstandard attempts to give categorical semantics for quantum mechanics with dirac deltas. The connection between the complex vs nonstandard approach is very mysterious to me, and I wonder if there is some formal correspondance between both.
Cole, is the work of you and your coauthors available somewhere? It looks interesting.
Evan Patterson said:
I think the motivation is that Owen wants to do a synthetic version of stochastic differential equations.
It would be very cool if @John Baez and @Jason Erbele.'s use of linear relations over to compose "system of constant-coefficient linear ordinary differential equations" could be adapted to some class of stochastic differential equations by means of Gaussian relations [1,2,3].
Jean-Baptiste Vienney said:
Cole, is the work of you and your coauthors available somewhere? It looks interesting.
Yes, I have included 3 articles on Gaussian relations with 3 different descriptions in the previous response. The connection between all 3 is a bit confusing to me.
OK, I think I've managed to finally get my thoughts in order about this, and I wrote them up briefly here if anyone is interested.
Is there a nice way of seeing this as a "stochastic vector field"? Classically, no: this type of equation cannot be formalized by probability distributions over tangent spaces
@Owen Lynch, is this some well-known thing in the field, or is there a simple explanation for why this is true which could be understood by someone like myself who knows almost nothing about stochastic calculus?
I know you weren't asking me, but I can't resist: there certainly is an interesting class of stochastic differential equations that can be formalized using probability distributions on tangent spaces!
If you have this, a solution will be a 'random differentiable curve' whose tangent vector at each point is randomly picked out by the probability distribution on the tangent space at that point.
But the problem is that a lot of the most interesting stochastic differential equations are wilder than this. The fundamental one is 'Brownian motion'.
Heuristically, we can write Brownian motion in as the solution of
where is a randomly chosen path and is white noise.
We can also make this perfectly rigorous, but it takes work! For starters, it takes work to formalize what white noise actually is!
But if you try to pretend it gives a probability distribution on each tangent space of you run into trouble, because you wind up wanting it to be a Gaussian of infinite variance!
So, it's almost like that old puzzle where someone tells you to pick a uniformly distributed random real number and you realize that makes no sense because there's no uniform probability measure on the real line. But in this situation there's a way around that problem.
And it's amusing that Owen is now trying to formalize the idea of a Gaussian with infinitesimal variance.
Anyway, white noise can be formalized and then you can show solve the stochastic differential equation I wrote down, and the solution is called Brownian motion. Then you can prove Brownian motion is almost surely (i.e., with probability one) nondifferentiable at each time , but almost surely continuous.
John Baez said:
But if you try to pretend it gives a probability distribution on each tangent space of you run into trouble, because you wind up wanting it to be a Gaussian of infinite variance!
Hmm, interesting. I was hoping that there was some sort of connection which could easily be made to Kähler quantization in Symplectic geometry, where Gaussian probability distributions on phase space respecting the "semiclassical uncertainty relations" are represented by Lagrangian submanifolds whose complex part is positive semidefinite. Because in this setting you can still represent semiclassical states with "infinite variance" (more accurately states of completely uncorrelated noise), interpreted as the maximally mixed states, by means of maximally coisotropic submanifolds.
There seems to be some superficial similarity, so one more thing to the list of things I have to read about.
John Baez said:
Anyway, white noise can be formalized and then you can show solve the stochastic differential equation I wrote down, and the solution is called Brownian motion. Then you can prove Brownian motion is almost surely (i.e., with probability one) nondifferentiable at each time , but almost surely continuous.
How do you formalize white noise?? I'm only familiar with the semigroup approach and the stochastic integration approach to SDEs.
Also, maybe this is a way of putting my idea for what a Gaussian with infinitesimal variance should be. The dual to a map is a positive linear map . I think the Gaussian at should be , where is the Laplacian.
Owen Lynch said:
John Baez said:
Anyway, white noise can be formalized and then you can show solve the stochastic differential equation I wrote down, and the solution is called Brownian motion. Then you can prove Brownian motion is almost surely (i.e., with probability one) nondifferentiable at each time $t$, but almost surely continuous.
How do you formalize white noise?? I'm only familiar with the semigroup approach and the stochastic integration approach to SDEs.
As a random distribution usually, i.e. a probability measure on the space of tempered distributions . It‘s easy to construct if you‘ve already constructed Brownian motion: just take the pushforward of its law along the distributional derivative.
Owen Lynch said:
Also, maybe this is a way of putting my idea for what a Gaussian with infinitesimal variance should be. The dual to a map is a positive linear map . I think the Gaussian at should be , where is the Laplacian.
Of course, that doesn't actually work because is not positive. However, it is almost positive -- if and . And this is the condition for the map given by to be positive!
Owen Lynch said:
I think the Gaussian at should be , where is the Laplacian.
By "the Gaussian" do you mean the probability density function for a Gaussian probability distribution?
No, I mean that Markov kernels are dual to positive linear maps .
(or really, , but whose counting)
In the same way that measurable functions are dual to von Neumann algebra maps .
@Owen Lynch This is a bit outside of my area of expertise, so please correct me if I am wrong.
I am kind of repeating myself here, but I think I am coming to understand your motivation better.
For the sake of simplicity, take to be the singleton set. Is it not the case, that you are assuming that the positive linear map above is a probability density function for some probability distribution?
I think this is the root of the problem, because asking for the existence of probability density functions is too strong. At least as far as I understand, probability distributions are not in canonical bijection with probability density functions.
If you take the Fourier transform of the probability density function of a given probability distribution, this is the characteristic function for that probability distribution. However, conversely, given a multivariate Gaussian probability distributions with positive-semidefinite covariance, then it is only possible to define the probability density function when the covariance is strictly positive. But it is always possible to define the characteristic function!
So that is why I think you should work with characteristic functions rather than with infinitesimals because, at least in the Euclidean setting, lots of the categorical semantics for Gaussian probability with positive-semidefinite covariance is already worked out. But if you are committed to the SDG route, maybe there is lots of juicy categorical semantics waiting to be discovered!
It is just that the category of Gaussian relations is currently getting a bit of attention, and has a complete equational theory, which could save you some work if it is actually useful for your purposes. On the one hand is expressive enough to have states which pick out points with no variance (reminiscent of infinitessimals); as well as on the other hand, states which uniformly relate the point to the whole space (reminiscent of infinities), which I think is the correct semantics for your white noise in this setting.
No, it's not a probability density, it's the result of integrating the distribution over an L^infty function. The problem with Gaussian relations is that solutions to SDEs involve all sorts of probability distributions, not just Gaussians!
Here's an overview I wrote up for the positive linear map stuff: https://www.localcharts.org/t/variable-first-probability-theory/1875
And here's a more rigorous reference: https://www.semanticscholar.org/paper/From-Kleisli-Categories-to-Commutative-C*-algebras%3A-Furber-Jacobs/e7e86c8cda2f5067449d6facce4a9a964c57fe31
I definitely want to learn more about characteristic functions though, so I'll check out some of your references!
Owen Lynch said:
How do you formalize white noise??
Benedikt gave a good answer: you can describe white noise as a probability measure on the space of tempered distributions. A more simple-minded way of saying approximately the same thing is that instead of trying to define white noise directly we define a real-valued random variable for each 'test function' , which can be a Schwartz function or even an function. We heuristically think of as meaning , but we never say what is directly. We can define these random variables in such a way that each one has mean zero:
but also
for all . And so on: there are explicit formulas for all the expected values . These closely mimic the usual formulas for the moments of a Gaussian on a finite-dimensional vector space, so we can think of white noise heuristically as a Gaussian on - but it's not really a probability measure on .
I helped my advisor write a book on this and related topics, but unfortunately it is not very easy to read.
OK, maybe I can think now about how to get a Brownian motion out of this. A Brownian motion should be a random variable that takes values in continuous maps . If we convert the equation
into an integral equation
then we can "integrate by parts" to get
I'd then like to interpret formally as something like , assuming that there's some pullback operation on probability measures on the space of tempered distributions, so then I get
Is this the right track or am I totally off?
Cole Comfort said:
Owen Lynch This is a bit outside of my area of expertise, so please correct me if I am wrong.
It is just that the category of Gaussian relations is currently getting a bit of attention, and has a complete equational theory, which could save you some work if it is actually useful for your purposes. On the one hand is expressive enough to have states which pick out points with no variance (reminiscent of infinitessimals); as well as on the other hand, states which uniformly relate the point to the whole space (reminiscent of infinities), which I think is the correct semantics for your white noise in this setting.
I'm now reading https://drops.dagstuhl.de/storage/00lipics/lipics-vol270-calco2023/LIPIcs.CALCO.2023.13/LIPIcs.CALCO.2023.13.pdf, which I got from the reference in your recent paper, and I think it's very cool! Not exactly what I'm looking for w.r.t. infinitesimals, but definitely pertains to lots of other of my interests. Thanks for the recommendation! I may have to make a localcharts post about this soon.
Owen Lynch said:
OK, maybe I can think now about how to get a Brownian motion out of this. A Brownian motion should be a random variable that takes values in continuous maps . If we convert the equation
No, for Brownian motion we take white noise as a function of time and solve
This is a lot easier to deal with!
Ah, that is a lot easier!!!
So I suppose for n-dimensional Brownian motion, we just take n white noises.
(i.e., n independent white noises)
I want to think about how this extends to SDEs; don't spoil it!
I won't give it away. Note that with the way I've (sketchily) defined white noise and Brownian motion, for each interval we get a random variable
and for disjoint intervals these random variables are stochastically independent. This is part of what we expect from Brownian motion: how it moves for some interval of time can't be predicted given how it's moved before.
OK, here's my guess. Given white noise, where , a solution to the SDE
is a path-valued random variable such that for all , and test function ,
Or probably "almost all"
In particular, we can take to get a formula for
I suppose this shows how to define white noise, if you start with a Brownian motion , then define
I don't really know a lot about stochastic differential equations (except for Brownian motion), so all I can honestly say is that your answer looks good to me. The last sentence is definitely true: a bunch of people assume they know what Brownian motion is, and define white noise to be the derivative of that, in the distributional sense: