You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Here's a basic question. I know that probability is an active area of research in category theory. (I'm currently reading Tobias Fritz' 'A synthetic approach to Markov kernels', which is great.) I know also that there's a few papers (and quite a few scattered online conversations) about defining entropy in a category-theoretic way. I'm wondering how much further into information theory it is currently possible to go.
In particular, I'm interested to know if there's any work on characterising the Kullback-Leibler divergence and its special cases, the mutual information and conditional mutual information, in category-theoretic terms. I'd be especially interested to know about any approach based on information geometry, which would allow us to think about manifolds of probability distributions and the minimal Kullback-Leibler divergences between them. That seems like it should be fairly well suited to a category theoretical approach - it seems like maybe you can get it by considering a category of convex maps between manifolds, or something like that? - and if you can define mixture families and exponential families along with the KL divergence then then that's already enough to build a sizable chunk of information theory.
But generally anything to do with information theory and category theory would be great to know about.
or should this be under applied category theory? I'm unsure where the boundaries are.
Hi! Maybe the closest thing to a category-theoretic approach to information geometry is Tom Leinster (and others)'s work on magnitude. Some places to start are here, https://golem.ph.utexas.edu/category/2011/10/measuring_diversity.html, and here, https://golem.ph.utexas.edu/category/2016/08/a_survey_of_magnitude.html.
Besides Fritz-Leinster, there's also this paper by Gagne and Panangaden https://arxiv.org/abs/1703.08853
(I say this question would have been definitely on topic in the ACT stream)
If we're talking about category-theoretic characterizations of the Kullback-Leibler divergence, the first paper on that was not "Fritz-Leinster" (no such paper), but "Baez-Fritz"):
https://arxiv.org/abs/1402.3067
We called it "relative entropy" instead of "Kullback-Leibler divergence" because that's a more informative term, and we called it a "Bayesian characterization" instead of a "category-theoretic characterization" because 1) it's both, and 2) we wanted people to read our paper.
The arguments in this paper were quite technical because we handled the case of relative entropy for two probability distributions that both vanish on some points, giving expressions like 0/0 ln(0/0).
Later Tom Leinster, who'd worked with us earlier on the categorical characterization of Shannon entropy and Renyi entropy, simplified the argument by leaving out these tricky cases:
https://arxiv.org/abs/1712.04903
However, I am still in awe of how Tobias Fritz pushed the argument through to handle all pairs of probability distributions on finite sets!
This is one of relatively few papers that combines category theory and arguments in the usual style of analysis - battles to the death with epsilons and deltas.
I happen to like that combination.
I'm super interested in these topics also; I am interested in the general cases of relative entropy but also the magnitude stuff. One of my collaborators has a paper on magnitude but in a slightly different application sphere: https://arxiv.org/abs/1908.02692
I've wondered for years now if information geometry has a nice sheafy interpretation which makes things very natural and clear, but I've not had the brain power to see that through, or found any papers that do it – apologies for vagueness.
Several people have asked about information geometry and category theory. Some of the earliest, maybe the earliest, explicitly categorical work on the category of Markov kernels was done by N. N. Cencov. The main reference is the 1982 English translation of his Russian monograph. In addition to category theory, this work is full of differential geometry and helped pioneer information geometry before that term existed. I haven't read the more differential-geometric parts of Cencov's book, so I can't say too much more.
Later work on information geometry seems to have deliberately stripped out the categorical language, see: https://doi.org/10.1090/S0002-9939-1986-0848890-5 and https://arxiv.org/abs/1207.4139
Maybe one of us should put it back :)
I've had Čencov's book page as an open tab for some time, I wonder if I heard about it from you. Didn't get around to trying to get a pdf yet https://www.ams.org/books/mmono/053/
Evan Patterson said:
Several people have asked about information geometry and category theory. Some of the earliest, maybe the earliest, explicitly categorical work on the category of Markov kernels was done by N. N. Cencov. The main reference is the 1982 English translation of his Russian monograph. In addition to category theory, this work is full of differential geometry and helped pioneer information geometry before that term existed. I haven't read the more differential-geometric parts of Cencov's book, so I can't say too much more.
Later work on information geometry seems to have deliberately stripped out the categorical language, see: https://doi.org/10.1090/S0002-9939-1986-0848890-5 and https://arxiv.org/abs/1207.4139
Maybe one of us should put it back :)
Thanks for these references! I definitely would be happy to see it returned. :D
A bit off-topic: are there good resources to learn about information geometry? In particular, by skimming Amari's 1993 and 2016 books, I've encountered the Fisher metric and e and m connections, but I haven't seen theorems that really leverage this framework. Specifically, the (beautiful!) results I've seen so far either dont don't connect back to inference --- e.g.
--- or are purely local results and hence could be discovered and applied entirely using Taylor series --- e.g.
So I'm wondering: are there any global theorems that use geometry data to talk about statistical inference? (Analogous to the global theorems of Riemannian geometry, e.g. that nearly-constantly-positively curved compact manifolds are covered by spheres).
For example, it would be neat to say that:
"The Rademacher complexity {statistics concept} of a compact statistical manifold is bounded by its volume {global geometric concept}.?."
or that
"On a compact statistical manifold, the number of local minima for the EM algorithm is bounded by some topological invariant of the manifold {topological/geometric, perhaps in the spirit of ChernBonnet}.?.
"
Have you seen Nihat Ay’s text? Re first question
Sam Tenka (naive student) said:
A bit off-topic: are there good resources to learn about information geometry?
I don't know if they're good resources, but I wrote a bunch of blog articles on information geometry:
They will at least give a viewpoint that's a bit different from the standard one.
Matt Cuffaro (he/him) said:
Have you seen Nihat Ay’s text? Re first question
Ooh! No, I haven't. It looks great! Thanks for the pointer.
John Baez said:
Sam Tenka (naive student) said:
A bit off-topic: are there good resources to learn about information geometry?
I don't know if they're good resources, but I wrote a bunch of blog articles on information geometry:
They will at least give a viewpoint that's a bit different from the standard one.
Woah! InfoGeometry of evolution?! This looks wonderful!
Beginner here, so bear with me: was just curious if there was an information theoretic interpretation to CT, e.g., a measure like kolmogorov complexity wrt to the "simplest" category or compression wrt to structure invariance? I also warmly welcome an explanation why this doesn't make sense :].
This paper might interest you.
Eventually, Yanofsky should be coming out with a book on ‘theoretical CS for category theorists’ so I imagine that will be applicable too.
oh fantastic, ty
a bit confused why infinitary constructions in CT should be different than infinitary lambda calculus in this sense; but have some homework to do on this first
Universal constructions and free constructions do not add any information to a given structure. E.g. path categories just add the necessary structure to a graph which is needed to fulfil the requirements to be a category. There is "no extra information" added. I think the "theorems for free" idea is similar: no extra information can be used to do something goofy with a parametric function.
I have not seen this expressed very often, maybe it is clear for most category theorists. One instance I know of is reported in Morphisms and Categories: Comparing and Transforming by Edgar Asher. Seymour Papert describes the universal construction of a product where and are the projections, and is the product: "All messages emitted toward and can be transmitted by ".