Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: categorical probability

Topic: Infinitesimal Distributions

Owen Lynch (Apr 24 2024 at 00:08):

Is there any theory that deals with "the normal distribution with variance $d$ , such that $d^2 = 0$ "? Equivalently stated, is there a model of SDG with a good Giry monad on it?

John Baez (Apr 24 2024 at 08:01):

You're making me think of nonstandard analysis, where we can have a step function of width $1/n$ and height $n$ where $n$ is a 'hyperinteger', that is, an integer that's bigger than any standard integer. Its integral is $1$ so it serves as a version of the Dirac delta. But I guess we can equally well define, in nonstandard analysis, a Gaussian

$p(x) = \exp(-x^2/2\sigma) / \sqrt{2 \pi \sigma}$

where the variance $\sigma$ is a nonstandard integer (or indeed any hyperreal that's bigger than all reals).

John Baez (Apr 24 2024 at 08:01):

This should have integral 1 if I got the formula right.

John Baez (Apr 24 2024 at 08:03):

Given how nonstandard analysis is set up, there should be a whole nonstandard theory of the Giry monad that closely mimics the usual theory.

John Baez (Apr 24 2024 at 08:04):

But the difference between what you're suggesting and this is that even if $n$ is a hyperinteger so $1/n$ is infinitesimal, $1/n^2$ is not zero: it's just a smaller infinitesimal!

Benedikt Peterseim (Apr 24 2024 at 08:22):

Owen Lynch said:

Is there any theory that deals with "the normal distribution with variance $d$, such that $d^2 = 0$"? Equivalently stated, is there a model of SDG with a good Giry monad on it?

As for the second question, there is the "distribution monad" on models of SDG, considered by Kock I think. It should be mentioned somewhere in "Commutative Monads as a Theory of Distributions" if I recall the name of that paper correctly. This also includes distributions like the derivative of the delta distribution etc. But you could then restrict to posit normalized distributions if that bothers you. Maybe that’s a starting point for what you’re looking for?

Cole Comfort (Apr 24 2024 at 11:32):

Owen Lynch said:

Is there any theory that deals with "the normal distribution with variance $d$ , such that $d^2 = 0$ "? Equivalently stated, is there a model of SDG with a good Giry monad on it?

Maybe I am not understanding the question, but Gaussian distributions can be defined with respect to a positive semidefinite covariance matrix. In this case, then you can have no variance/covariance in some variables, where of course $0^2=0$ . If the covariance matrix not strictly positive definite then there will be no probability density function, but it still has a characteristic function.

Matt Cuffaro (he/him) (Apr 24 2024 at 11:35):

https://arxiv.org/pdf/math/0407242.pdf I'm aware (and interested, from afar) of this paper which studies the normal distribution in a Cahiers topos, with the intent of resolving the scenario where the heat kernel is defined as a case statement: the centered normal distribution $p(x; 0, \sigma)$ or $\delta(0)$ if $\sigma=0$

Peva Blanchard (Apr 24 2024 at 11:36):

Just a newbie question: SDG stands for "synthetic differential geometry"?

Cole Comfort (Apr 24 2024 at 11:41):

Cole Comfort said:

Owen Lynch said:

Is there any theory that deals with "the normal distribution with variance $d$ , such that $d^2 = 0$ "? Equivalently stated, is there a model of SDG with a good Giry monad on it?

Maybe I am not understanding the question, but Gaussian distributions can be defined with respect to a positive semidefinite covariance matrix. In this case, then you can have no variance/covariance in some variables, where of course $0^2=0$ . If the covariance matrix not strictly positive definite then there will be no probability density function, but it still has a characteristic function.

Then if you are happy to give up probability density functions, you can just work with characteristic functions. And you can push forward the characteristic functions along affine transformations and so on. It is even well-enough behaved that you can compose the characteristic functions of extended Gaussian distributions (Gaussian distributions on linear subspaces) relationally, regarded as a subcategory of $\mathbb C$ -affine relations, using complex analysis instead of nonstandard analysis.

Cole Comfort (Apr 24 2024 at 12:06):

Following work in geometric quantization, this is how me and my coauthors managed to give a categorical semantics for "infinitely squeezed" Dirac deltas in Gaussian quantum mechanics, avoiding Schwartz distributions and nonstandard analysis. But at the same time, there have been nonstandard attempts to give semantics to the same end. The connection between the complex vs nonstandard approach is very mysterious to me, and I wonder if there is some formal correspondance between both.

John Baez (Apr 24 2024 at 12:38):

Cole Comfort said:

Owen Lynch said:

Is there any theory that deals with "the normal distribution with variance $d$ , such that $d^2 = 0$ "? Equivalently stated, is there a model of SDG with a good Giry monad on it?

Maybe I am not understanding the question, but Gaussian distributions can be defined with respect to a positive semidefinite covariance matrix. In this case, then you can have no variance/covariance in some variables, where of course $0^2=0$ . If the covariance matrix not strictly positive definite then there will be no probability density function, but it still has a characteristic function.

Since Owen mentioned SDG (synthetic differential geometry), where you have infinitesimals that are not zero but square to zero, I assumed he was asking if in SDG you can have a Gaussian whose variance $d$ is not zero but square to zero. But he left out the italicized phrase.

John Baez (Apr 24 2024 at 12:40):

Since I don't see how to put an SDG-style infinitesimal $d$ into the formula

$\exp(-x^2/2d) / \sqrt{2\pi d}$

I switched the topic slightly to nonstandard analysis, where you can do this.

Sam Staton (Apr 28 2024 at 10:09):

Since @Cole Comfort mentioned categories of Gaussian probability kernels $\mathbf{Gauss}$ , which allows zero variance, I thought to mention that we can also regard all Markov categories as embedded in Kleisli categories for commutative monads on cartesian categories. So we can embed $\mathbf{Gauss}$ in a Kleisli category.

Sam Staton (Apr 28 2024 at 10:09):

The canonical construction would give us a "GiryGauss" monad on the category of presheaves on a category of affine maps between Euclidean spaces. I'm not sure whether anyone wrote this down (@Dario Stein?). Not SDG but getting a bit closer in spirit.

Dario Stein (Apr 28 2024 at 11:31):

A bit of this is in my thesis; I also did some (unpublished) work with Alex Simpson confirming that Gaussians can be treated elegantly as sheaves on affine maps as you say, or over the category of Euclidean co-isometries (i.e. matrices such that $AA^T = I$ ). That makes them a bit like a continuous version of nominal sets ;) I'm not very familiar with SDG however -- what is required for that?

Cole Comfort (Apr 28 2024 at 14:09):

@Dario Stein

Actual sheaves, not just presheaves?

Dario Stein (Apr 28 2024 at 14:12):

Sheaves for the atomic topology, much like in this work of Alex Simpson, or his upcoming LICS paper. This is another aspect that is reminiscent of nominal sets.

Cole Comfort (Apr 28 2024 at 21:46):

Dario Stein said:

A bit of this is in my thesis; I also did some (unpublished) work with Alex Simpson confirming that Gaussians can be treated elegantly as sheaves on affine maps as you say, or over the category of Euclidean co-isometries (i.e. matrices such that $AA^T = I$ ). That makes them a bit like a continuous version of nominal sets ;) I'm not very familiar with SDG however -- what is required for that?

I look forward to reading more about this when it is published. Do you know if Gaussian relations can also be constructed in a similar way?

David Michael Roberts (Apr 29 2024 at 02:14):

John Baez said:

Since I don't see how to put an SDG-style infinitesimal $d$ into the formula

$\exp(-x^2/2d) / \sqrt{2\pi d}$

I switched the topic slightly to nonstandard analysis, where you can do this.

One issue is probably that you don't know that $d > 0$ , and also I think it is false that $d$ is invertible. Leaving aside the issue of taking the square root! (one might conceivably work around this by considering something like a higher-order infinitesimal $d_1$ , so that $d_1^2$ is not zero, and consider $d=d_1^2$ , but we are likely already hosed by the other facts I mentioned)

Benedikt Peterseim (Apr 29 2024 at 12:50):

There are models of SDG with invertible infinitesimals, too, but I guess this is changing the question again? ( Whose motivation was what precisely other than "cool, infinitesimals"? ;) )

Evan Patterson (Apr 29 2024 at 17:12):

I think the motivation is that Owen wants to do a synthetic version of stochastic differential equations.

Owen Lynch (May 01 2024 at 05:33):

Yeah, you might think of it not as a pdf, but rather as "the infinitesimal timestep for an SDE"

John Baez (May 01 2024 at 07:11):

The cool thing about Brownian motion is how in a time step $\Delta t$ the particle moves a random amount whose standard deviation is $\sqrt{\Delta t}$ , which suggests a formalism where you have square roots of infinitesimals, which are bigger than the infinitesimals themselves.

John Baez (May 01 2024 at 07:16):

This is related to how Brownian paths are fractal.

Jean-Baptiste Vienney (May 01 2024 at 07:23):

Cole Comfort said:

Following work in geometric quantization, this is how me and my coauthors managed to give a categorical semantics for "infinitely squeezed" states such as Dirac deltas in Gaussian quantum mechanics, avoiding Schwartz distributions and nonstandard analysis. But at the same time, there have been nonstandard attempts to give categorical semantics for quantum mechanics with dirac deltas. The connection between the complex vs nonstandard approach is very mysterious to me, and I wonder if there is some formal correspondance between both.

Cole, is the work of you and your coauthors available somewhere? It looks interesting.

Cole Comfort (May 01 2024 at 18:08):

Evan Patterson said:

I think the motivation is that Owen wants to do a synthetic version of stochastic differential equations.

It would be very cool if @John Baez and @Jason Erbele.'s use of linear relations over $\mathbb{R} (s)$ to compose "system of constant-coefficient linear ordinary differential equations" could be adapted to some class of stochastic differential equations by means of Gaussian relations [1,2,3].

Cole Comfort (May 01 2024 at 18:10):

Jean-Baptiste Vienney said:

Cole, is the work of you and your coauthors available somewhere? It looks interesting.

Yes, I have included 3 articles on Gaussian relations with 3 different descriptions in the previous response. The connection between all 3 is a bit confusing to me.

Owen Lynch (May 03 2024 at 06:40):

OK, I think I've managed to finally get my thoughts in order about this, and I wrote them up briefly here if anyone is interested.

Cole Comfort (May 03 2024 at 07:31):

Is there a nice way of seeing this as a "stochastic vector field"? Classically, no: this type of equation cannot be formalized by probability distributions over tangent spaces

@Owen Lynch, is this some well-known thing in the field, or is there a simple explanation for why this is true which could be understood by someone like myself who knows almost nothing about stochastic calculus?

John Baez (May 03 2024 at 07:40):

I know you weren't asking me, but I can't resist: there certainly is an interesting class of stochastic differential equations that can be formalized using probability distributions on tangent spaces!

John Baez (May 03 2024 at 07:41):

If you have this, a solution will be a 'random differentiable curve' whose tangent vector at each point is randomly picked out by the probability distribution on the tangent space at that point.

John Baez (May 03 2024 at 07:42):

But the problem is that a lot of the most interesting stochastic differential equations are wilder than this. The fundamental one is 'Brownian motion'.

John Baez (May 03 2024 at 07:44):

Heuristically, we can write Brownian motion in $\mathbb{R}^n$ as the solution of

$\frac{d x}{d t} = W(x)$

where $x \colon \mathbb{R} \to \mathbb{R}^n$ is a randomly chosen path and $W$ is white noise.

John Baez (May 03 2024 at 07:45):

We can also make this perfectly rigorous, but it takes work! For starters, it takes work to formalize what white noise actually is!

John Baez (May 03 2024 at 07:47):

But if you try to pretend it gives a probability distribution on each tangent space of $\mathbb{R}^n$ you run into trouble, because you wind up wanting it to be a Gaussian of infinite variance!

John Baez (May 03 2024 at 07:58):

So, it's almost like that old puzzle where someone tells you to pick a uniformly distributed random real number and you realize that makes no sense because there's no uniform probability measure on the real line. But in this situation there's a way around that problem.

And it's amusing that Owen is now trying to formalize the idea of a Gaussian with infinitesimal variance.

John Baez (May 03 2024 at 07:59):

Anyway, white noise can be formalized and then you can show solve the stochastic differential equation I wrote down, and the solution is called Brownian motion. Then you can prove Brownian motion is almost surely (i.e., with probability one) nondifferentiable at each time $t$ , but almost surely continuous.

Cole Comfort (May 03 2024 at 14:01):

John Baez said:

But if you try to pretend it gives a probability distribution on each tangent space of $\mathbb{R}^n$ you run into trouble, because you wind up wanting it to be a Gaussian of infinite variance!

Hmm, interesting. I was hoping that there was some sort of connection which could easily be made to Kähler quantization in Symplectic geometry, where Gaussian probability distributions on phase space respecting the "semiclassical uncertainty relations" are represented by Lagrangian submanifolds whose complex part is positive semidefinite. Because in this setting you can still represent semiclassical states with "infinite variance" (more accurately states of completely uncorrelated noise), interpreted as the maximally mixed states, by means of maximally coisotropic submanifolds.

There seems to be some superficial similarity, so one more thing to the list of things I have to read about.

Owen Lynch (May 03 2024 at 15:50):

John Baez said:

Anyway, white noise can be formalized and then you can show solve the stochastic differential equation I wrote down, and the solution is called Brownian motion. Then you can prove Brownian motion is almost surely (i.e., with probability one) nondifferentiable at each time $t$ , but almost surely continuous.

How do you formalize white noise?? I'm only familiar with the semigroup approach and the stochastic integration approach to SDEs.

Owen Lynch (May 03 2024 at 15:59):

Also, maybe this is a way of putting my idea for what a Gaussian with infinitesimal variance should be. The dual to a map $1 \to \mathcal{P}(X)$ is a positive linear map $C(X) \to \mathbb{R}$ . I think the Gaussian at $x \in X$ should be $f \mapsto \Delta(f)(x)$ , where $\Delta$ is the Laplacian.

Benedikt Peterseim (May 03 2024 at 16:02):

Owen Lynch said:

John Baez said:

Anyway, white noise can be formalized and then you can show solve the stochastic differential equation I wrote down, and the solution is called Brownian motion. Then you can prove Brownian motion is almost surely (i.e., with probability one) nondifferentiable at each time $t$, but almost surely continuous.

How do you formalize white noise?? I'm only familiar with the semigroup approach and the stochastic integration approach to SDEs.

As a random distribution usually, i.e. a probability measure on the space of tempered distributions $S'(\mathbb{R}^n)$ . It‘s easy to construct if you‘ve already constructed Brownian motion: just take the pushforward of its law along the distributional derivative.

Owen Lynch (May 03 2024 at 16:05):

Owen Lynch said:

Also, maybe this is a way of putting my idea for what a Gaussian with infinitesimal variance should be. The dual to a map $1 \to \mathcal{P}(X)$ is a positive linear map $C(X) \to \mathbb{R}$ . I think the Gaussian at $x \in X$ should be $f \mapsto \Delta(f)(x)$ , where $\Delta$ is the Laplacian.

Of course, that doesn't actually work because $\Delta$ is not positive. However, it is almost positive -- $\Delta(f)(x) \geq 0$ if $f \geq 0$ and $f(x) = 0$ . And this is the condition for the map $C(X) \to \mathbb{R}[d](d^2)$ given by $f \mapsto f(x) + \Delta(f)(x)d$ to be positive!

Cole Comfort (May 03 2024 at 16:05):

Owen Lynch said:

I think the Gaussian at $x \in X$ should be $f \mapsto \Delta(f)(x)$ , where $\Delta$ is the Laplacian.

By "the Gaussian" do you mean the probability density function for a Gaussian probability distribution?

Owen Lynch (May 03 2024 at 16:07):

No, I mean that Markov kernels $X \to \mathcal{P}(Y)$ are dual to positive linear maps $C(Y) \to C(X)$ .

Owen Lynch (May 03 2024 at 16:07):

(or really, $L^\infty(Y) \to L^\infty(X)$ , but whose counting)

Owen Lynch (May 03 2024 at 16:09):

In the same way that measurable functions $X \to Y$ are dual to von Neumann algebra maps $L^\infty(Y) \to L^\infty(X)$ .

Cole Comfort (May 03 2024 at 17:07):

@Owen Lynch This is a bit outside of my area of expertise, so please correct me if I am wrong.

I am kind of repeating myself here, but I think I am coming to understand your motivation better.

For the sake of simplicity, take $X$ to be the singleton set. Is it not the case, that you are assuming that the positive linear map $L^\infty (Y) \to \mathbb{R}$ above is a probability density function for some probability distribution?

I think this is the root of the problem, because asking for the existence of probability density functions is too strong. At least as far as I understand, probability distributions are not in canonical bijection with probability density functions.

If you take the Fourier transform of the probability density function of a given probability distribution, this is the characteristic function for that probability distribution. However, conversely, given a multivariate Gaussian probability distributions with positive-semidefinite covariance, then it is only possible to define the probability density function when the covariance is strictly positive. But it is always possible to define the characteristic function!

So that is why I think you should work with characteristic functions rather than with infinitesimals because, at least in the Euclidean setting, lots of the categorical semantics for Gaussian probability with positive-semidefinite covariance is already worked out. But if you are committed to the SDG route, maybe there is lots of juicy categorical semantics waiting to be discovered!

It is just that the category of Gaussian relations is currently getting a bit of attention, and has a complete equational theory, which could save you some work if it is actually useful for your purposes. On the one hand is expressive enough to have states which pick out points with no variance (reminiscent of infinitessimals); as well as on the other hand, states which uniformly relate the point to the whole space (reminiscent of infinities), which I think is the correct semantics for your white noise in this setting.

Owen Lynch (May 03 2024 at 21:14):

No, it's not a probability density, it's the result of integrating the distribution over an L^infty function. The problem with Gaussian relations is that solutions to SDEs involve all sorts of probability distributions, not just Gaussians!

Owen Lynch (May 03 2024 at 21:17):

Here's an overview I wrote up for the positive linear map stuff: https://www.localcharts.org/t/variable-first-probability-theory/1875

Owen Lynch (May 03 2024 at 21:19):

And here's a more rigorous reference: https://www.semanticscholar.org/paper/From-Kleisli-Categories-to-Commutative-C*-algebras%3A-Furber-Jacobs/e7e86c8cda2f5067449d6facce4a9a964c57fe31

Owen Lynch (May 03 2024 at 21:20):

I definitely want to learn more about characteristic functions though, so I'll check out some of your references!

John Baez (May 03 2024 at 21:39):

Owen Lynch said:

How do you formalize white noise??

Benedikt gave a good answer: you can describe white noise as a probability measure on the space of tempered distributions. A more simple-minded way of saying approximately the same thing is that instead of trying to define white noise $W(x)$ directly we define a real-valued random variable $W(f)$ for each 'test function' $f$ , which can be a Schwartz function or even an $L^2$ function. We heuristically think of $W(f)$ as meaning $\int W(x) f(x) dx$ , but we never say what $W(x)$ is directly. We can define these random variables $W(f)$ in such a way that each one has mean zero:

$E(W(f)) = 0$

but also

$E(W(f) W(g)) = \langle f, g \rangle$

for all $f, g$ . And so on: there are explicit formulas for all the expected values $E(W(f_1) \cdots W(f_n))$ . These closely mimic the usual formulas for the moments of a Gaussian on a finite-dimensional vector space, so we can think of white noise heuristically as a Gaussian on $L^2(\mathbb{R}^n)$ - but it's not really a probability measure on $L^2(\mathbb{R}^n)$ .

I helped my advisor write a book on this and related topics, but unfortunately it is not very easy to read.

Owen Lynch (May 04 2024 at 05:17):

OK, maybe I can think now about how to get a Brownian motion out of this. A Brownian motion should be a random variable that takes values in continuous maps $\gamma \colon \mathbb{R}_{\geq 0} \to \mathbb{R}^n$ . If we convert the equation

$\gamma'(t) = W(\gamma(t))$

into an integral equation

$\int f(t) \gamma'(t) dt = \int f(t) W(\gamma(t)) dt$

then we can "integrate by parts" to get

$\int f'(t) \gamma(t) dt = \int f(t) W(\gamma(t)) dt$

I'd then like to interpret $\int f(t) W(\gamma(t)) dt$ formally as something like $\gamma^\ast(W)(f)$ , assuming that there's some pullback operation on probability measures on the space of tempered distributions, so then I get

$\int f'(t) \gamma(t) dt = \gamma^\ast(W)(f)$

Is this the right track or am I totally off?

Owen Lynch (May 04 2024 at 05:29):

Cole Comfort said:

Owen Lynch This is a bit outside of my area of expertise, so please correct me if I am wrong.

It is just that the category of Gaussian relations is currently getting a bit of attention, and has a complete equational theory, which could save you some work if it is actually useful for your purposes. On the one hand is expressive enough to have states which pick out points with no variance (reminiscent of infinitessimals); as well as on the other hand, states which uniformly relate the point to the whole space (reminiscent of infinities), which I think is the correct semantics for your white noise in this setting.

I'm now reading https://drops.dagstuhl.de/storage/00lipics/lipics-vol270-calco2023/LIPIcs.CALCO.2023.13/LIPIcs.CALCO.2023.13.pdf, which I got from the reference in your recent paper, and I think it's very cool! Not exactly what I'm looking for w.r.t. infinitesimals, but definitely pertains to lots of other of my interests. Thanks for the recommendation! I may have to make a localcharts post about this soon.

John Baez (May 04 2024 at 06:18):

Owen Lynch said:

OK, maybe I can think now about how to get a Brownian motion out of this. A Brownian motion should be a random variable that takes values in continuous maps $\gamma \colon \mathbb{R}_{\geq 0} \to \mathbb{R}^n$ . If we convert the equation

$\gamma'(t) = W(\gamma(t))$

No, for Brownian motion we take white noise $W(t)$ as a function of time and solve

$\gamma'(t) = W(t)$

This is a lot easier to deal with!

Owen Lynch (May 04 2024 at 06:46):

Ah, that is a lot easier!!!

Owen Lynch (May 04 2024 at 06:46):

So I suppose for n-dimensional Brownian motion, we just take n white noises.

Owen Lynch (May 04 2024 at 06:48):

(i.e., n independent white noises)

Owen Lynch (May 04 2024 at 06:48):

I want to think about how this extends to SDEs; don't spoil it!

John Baez (May 04 2024 at 07:09):

I won't give it away. Note that with the way I've (sketchily) defined white noise and Brownian motion, for each interval $[a,b]$ we get a random variable

$\gamma(b) - \gamma(a) = \int_a^b W(t) dt := W(\chi_{[a,b]})$

and for disjoint intervals these random variables are stochastically independent. This is part of what we expect from Brownian motion: how it moves for some interval of time can't be predicted given how it's moved before.

Owen Lynch (May 04 2024 at 23:08):

OK, here's my guess. Given white noise, where $W(f) \colon \Omega \to \mathbb{R}$ , a solution to the SDE

$dX(t) = (v(X(t)) + \sigma(X(t))W(t))dt$

is a path-valued random variable $X$ such that for all $\omega \in \Omega$ , and test function $f$ ,

$-\int f'(t) X(t, \omega) dt = \int f(t) v(X(t, \omega)) dt + W(t \mapsto f(t)\sigma(X(t, \omega)))(\omega)$

Owen Lynch (May 04 2024 at 23:08):

Or probably "almost all"

Owen Lynch (May 04 2024 at 23:10):

In particular, we can take $f = \chi_{[a,b]}$ to get a formula for $X(b) - X(a)$

Owen Lynch (May 05 2024 at 00:00):

I suppose this shows how to define white noise, if you start with a Brownian motion $A$ , then define

$W(f) := - \int f'(t) A(t) dt$

John Baez (May 05 2024 at 07:15):

I don't really know a lot about stochastic differential equations (except for Brownian motion), so all I can honestly say is that your answer looks good to me. The last sentence is definitely true: a bunch of people assume they know what Brownian motion is, and define white noise to be the derivative of that, in the distributional sense:

$W(f) := - \int f'(t) A(t) dt$