Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: learning: questions

Topic: Information theory

Nathaniel Virgo (Mar 25 2020 at 13:23):

Here's a basic question. I know that probability is an active area of research in category theory. (I'm currently reading Tobias Fritz' 'A synthetic approach to Markov kernels', which is great.) I know also that there's a few papers (and quite a few scattered online conversations) about defining entropy in a category-theoretic way. I'm wondering how much further into information theory it is currently possible to go.

In particular, I'm interested to know if there's any work on characterising the Kullback-Leibler divergence and its special cases, the mutual information and conditional mutual information, in category-theoretic terms. I'd be especially interested to know about any approach based on information geometry, which would allow us to think about manifolds of probability distributions and the minimal Kullback-Leibler divergences between them. That seems like it should be fairly well suited to a category theoretical approach - it seems like maybe you can get it by considering a category of convex maps between manifolds, or something like that? - and if you can define mixture families and exponential families along with the KL divergence then then that's already enough to build a sizable chunk of information theory.

But generally anything to do with information theory and category theory would be great to know about.

Nathaniel Virgo (Mar 25 2020 at 13:25):

or should this be under applied category theory? I'm unsure where the boundaries are.

Paolo Perrone (Mar 25 2020 at 15:13):

Hi! Maybe the closest thing to a category-theoretic approach to information geometry is Tom Leinster (and others)'s work on magnitude. Some places to start are here, https://golem.ph.utexas.edu/category/2011/10/measuring_diversity.html, and here, https://golem.ph.utexas.edu/category/2016/08/a_survey_of_magnitude.html.

Jules Hedges (Mar 25 2020 at 15:15):

Besides Fritz-Leinster, there's also this paper by Gagne and Panangaden https://arxiv.org/abs/1703.08853

Jules Hedges (Mar 25 2020 at 15:15):

(I say this question would have been definitely on topic in the ACT stream)

John Baez (Mar 25 2020 at 20:26):

If we're talking about category-theoretic characterizations of the Kullback-Leibler divergence, the first paper on that was not "Fritz-Leinster" (no such paper), but "Baez-Fritz"):

https://arxiv.org/abs/1402.3067

We called it "relative entropy" instead of "Kullback-Leibler divergence" because that's a more informative term, and we called it a "Bayesian characterization" instead of a "category-theoretic characterization" because 1) it's both, and 2) we wanted people to read our paper.

John Baez (Mar 25 2020 at 20:27):

The arguments in this paper were quite technical because we handled the case of relative entropy for two probability distributions that both vanish on some points, giving expressions like 0/0 ln(0/0).

John Baez (Mar 25 2020 at 20:28):

Later Tom Leinster, who'd worked with us earlier on the categorical characterization of Shannon entropy and Renyi entropy, simplified the argument by leaving out these tricky cases:

https://arxiv.org/abs/1712.04903

John Baez (Mar 25 2020 at 20:29):

However, I am still in awe of how Tobias Fritz pushed the argument through to handle all pairs of probability distributions on finite sets!

John Baez (Mar 25 2020 at 20:30):

This is one of relatively few papers that combines category theory and arguments in the usual style of analysis - battles to the death with epsilons and deltas.

John Baez (Mar 25 2020 at 20:30):

I happen to like that combination.

Bryan Bischof (Mar 25 2020 at 22:47):

I'm super interested in these topics also; I am interested in the general cases of relative entropy but also the magnitude stuff. One of my collaborators has a paper on magnitude but in a slightly different application sphere: https://arxiv.org/abs/1908.02692

I've wondered for years now if information geometry has a nice sheafy interpretation which makes things very natural and clear, but I've not had the brain power to see that through, or found any papers that do it – apologies for vagueness.

Evan Patterson (Mar 26 2020 at 18:27):

Several people have asked about information geometry and category theory. Some of the earliest, maybe the earliest, explicitly categorical work on the category of Markov kernels was done by N. N. Cencov. The main reference is the 1982 English translation of his Russian monograph. In addition to category theory, this work is full of differential geometry and helped pioneer information geometry before that term existed. I haven't read the more differential-geometric parts of Cencov's book, so I can't say too much more.

Later work on information geometry seems to have deliberately stripped out the categorical language, see: https://doi.org/10.1090/S0002-9939-1986-0848890-5 and https://arxiv.org/abs/1207.4139

Maybe one of us should put it back :)

Jules Hedges (Mar 26 2020 at 19:29):

I've had Čencov's book page as an open tab for some time, I wonder if I heard about it from you. Didn't get around to trying to get a pdf yet https://www.ams.org/books/mmono/053/

Bryan Bischof (Mar 29 2020 at 00:36):

Evan Patterson said:

Several people have asked about information geometry and category theory. Some of the earliest, maybe the earliest, explicitly categorical work on the category of Markov kernels was done by N. N. Cencov. The main reference is the 1982 English translation of his Russian monograph. In addition to category theory, this work is full of differential geometry and helped pioneer information geometry before that term existed. I haven't read the more differential-geometric parts of Cencov's book, so I can't say too much more.

Later work on information geometry seems to have deliberately stripped out the categorical language, see: https://doi.org/10.1090/S0002-9939-1986-0848890-5 and https://arxiv.org/abs/1207.4139

Maybe one of us should put it back :)

Thanks for these references! I definitely would be happy to see it returned. :D

Sam Tenka (Apr 06 2020 at 21:11):

A bit off-topic: are there good resources to learn about information geometry? In particular, by skimming Amari's 1993 and 2016 books, I've encountered the Fisher metric and e and m connections, but I haven't seen theorems that really leverage this framework. Specifically, the (beautiful!) results I've seen so far either dont don't connect back to inference --- e.g.

Gaussians form a hyperbolic space; a simplex is spherical

--- or are purely local results and hence could be discovered and applied entirely using Taylor series --- e.g.

the volume form induced by the metric is exactly the jeffrey's prior!
the embedding curvature controls the next-to-leading term in mean-square error in estimation

So I'm wondering: are there any global theorems that use geometry data to talk about statistical inference? (Analogous to the global theorems of Riemannian geometry, e.g. that nearly-constantly-positively curved compact manifolds are covered by spheres).

For example, it would be neat to say that:

"The Rademacher complexity {statistics concept} of a compact statistical manifold is bounded by its volume {global geometric concept}.?."
or that
"On a compact statistical manifold, the number of local minima for the EM algorithm is bounded by some topological invariant of the manifold {topological/geometric, perhaps in the spirit of ChernBonnet}.?.
"

Matt Cuffaro (he/him) (Apr 06 2020 at 22:50):

Have you seen Nihat Ay’s text? Re first question

John Baez (Apr 06 2020 at 22:53):

Sam Tenka (naive student) said:

A bit off-topic: are there good resources to learn about information geometry?

I don't know if they're good resources, but I wrote a bunch of blog articles on information geometry:

Information geometry.

They will at least give a viewpoint that's a bit different from the standard one.

Sam Tenka (Apr 07 2020 at 16:43):

Matt Cuffaro (he/him) said:

Have you seen Nihat Ay’s text? Re first question

Ooh! No, I haven't. It looks great! Thanks for the pointer.

Sam Tenka (Apr 07 2020 at 16:44):

John Baez said:

Sam Tenka (naive student) said:

A bit off-topic: are there good resources to learn about information geometry?

I don't know if they're good resources, but I wrote a bunch of blog articles on information geometry:

Information geometry.

They will at least give a viewpoint that's a bit different from the standard one.

Woah! InfoGeometry of evolution?! This looks wonderful!

ebigram (Jun 05 2021 at 08:27):

Beginner here, so bear with me: was just curious if there was an information theoretic interpretation to CT, e.g., a measure like kolmogorov complexity wrt to the "simplest" category or compression wrt to structure invariance? I also warmly welcome an explanation why this doesn't make sense :].

Fawzi Hreiki (Jun 05 2021 at 08:38):

This paper might interest you.

Fawzi Hreiki (Jun 05 2021 at 08:39):

Eventually, Yanofsky should be coming out with a book on ‘theoretical CS for category theorists’ so I imagine that will be applicable too.

ebigram (Jun 05 2021 at 11:09):

oh fantastic, ty

ebigram (Jun 05 2021 at 11:18):

a bit confused why infinitary constructions in CT should be different than infinitary lambda calculus in this sense; but have some homework to do on this first

Johannes Drever (Jun 23 2021 at 20:54):

Universal constructions and free constructions do not add any information to a given structure. E.g. path categories just add the necessary structure to a graph which is needed to fulfil the requirements to be a category. There is "no extra information" added. I think the "theorems for free" idea is similar: no extra information can be used to do something goofy with a parametric function.

I have not seen this expressed very often, maybe it is clear for most category theorists. One instance I know of is reported in Morphisms and Categories: Comparing and Transforming by Edgar Asher. Seymour Papert describes the universal construction of a product where $A_1$ and $A_2$ are the projections, and $P$ is the product: "All messages emitted toward $A_1$ and $A_2$ can be transmitted by $P$ ".