Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: deprecated: statistics reading group

Topic: housekeeping

Oliver Shetler (Jun 27 2020 at 04:40):

Here is the document I will review with the group at tomorrow's meeting. Abstracts-and-Notes.pdf

It includes a digest of six papers that I have selected as candidates for our first reading. Several of the papers' authors are on here, so if anybody reads this and feels that I have done a poor job picking out the key points, please let me know.

Arthur Parzygnat (Jun 27 2020 at 16:47):

I'm going to periodically update this post as I browse the papers. First, I'll write the key points and my interest. At the end, I'll chose the top 2.
McCullagh.
Numbers: 2, 5, 6, 11
Statistical model, parametrized statistical model, Bayesian model, extension of a model.
I'm interested in understanding how one compiles data into a statistical model (or what statistical models are compatible with that data), and how to make inference based on that model. What is an extension of a model and in what sense does inference require a model to be extendable? My interest in the categorical aspects are towards quantum mechanical generalizations (this occurs for all papers unless otherwise specified).
Morse and Sacksteder
Numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 yikes I'm sorry all of this sounds interesting and is written in a language familiar to me (invariants, etc).
Comparing experiments and sufficiency from a categorical perspective.
Simpson
Numbers: 5
I'm curious if Simpson claims that the Giry monad has a conceptual definition rather than a specific construction (probability measures on a space).
Patterson
Numbers: 3, 5, 7
I have little coding experience, but if the requirements are minimal, I would be interested in picking some up for the purpose of examples. I'm thinking this could be useful in parametric Bayesian updating, which I am very interested in learning about.
Perrone
Numbers: 3, 6, 7
Probability monads, convex monotone maps as lax morphisms, disintegrations
My interest in this may vary, but ATM, it is more on lax affine morphisms and the section on disintegrations (for two separate projects, one on entropy and one on conditional expectations). I am curious to see how the adjunction related to the Kantorovich monad is explained in terms of Choquet theory.
Fritz
Numbers: 4, 7, 8, 12
Sufficient statistics, conditional independence.
I'd like to know what a statistic is, how it is related to a statistical model (from McCullagh), and what it means for such a statistic to be sufficient. I assume it means sufficient for prediction/extrapolation, so it seems also related to extendability of a model. I also want to know how much of probability and statistics can be phrased in the Markov category language (and what are their limitations).
My top two choices for papers are
1. Morse & Sacksteder
2. Fritz (although I have read much of it, I think it is a good place to start)

Fabrizio Genovese (Jun 27 2020 at 17:14):

https://categorytheory.zulipchat.com/join/077iqsv6tve4vgi0vus83oxq/

Tomáš Gonda (Jun 27 2020 at 17:49):

McCullagh

This seems somewhat interesting as far as the philosophy of statistics is concerned and getting the fundamentals right, but it doesn't seem too connected to modelling the details of statistical inference as such otherwise.
3, 4, 5, 11

Morse & Sacksteder

This seems almost identical to some of my projects, so I feel like I should have read it already, but I haven't :sweat_smile:
3, 4, 11

Simpson

Based on the abstract, I don't see anything relevant to me personally. I guess learning some very basics of Giry monad and examples of computations with it is the most interesting part to me.

Patterson

Structuralist approach to data analysis sounds super cool, and I like the nuanced modelling of statistical theories. Chapters 3 and 4 are more interesting to me.
3, 5, 7, 10, 11

Perrone

This is also very much related to my work and interests, especially the stuff on L-ordered spaces. I didn't put it in the selection below, because I have already read parts of it and I feel like I can understand it pretty well on my own, but I am more than happy to go through it in more detail and discuss its contents.
2, 4, 7, 11, 17

Fritz

This is a great paper, I have read it in great detail already, but I am happy to discuss its contents with the rest of the group.

Top papers: (1) Patterson, (2) Morse & Sacksteder

Generally, I think am actually the most interested in learning about how the categorical modelling of statistical concepts relates to the actual practice of statistics and understanding what are the main obstructions towards bringing the categorical language closer to the practice.

Fabrizio Genovese (Jun 27 2020 at 17:52):

McCullagh - Interesting ponts: 4, 5. - I'm curious about how they model how something "makes sense".
Morse- Interesting points: 11 - Invariants are always interesting.
Simpson - Interesting points: 1, 3 - The idea of a probability sheaf is nice and I imagine related to stuff like contextuality. I'd like to know more
Patterson - 2, 7, 10 - I'm curious to know more about links between statistical models and logic.
Perrone - 2, 5 - I have no idea of what Kantorovich duality is, but I'm interested!
Fritz - 1 - Markov categories seem interesting

Britt Anderson (Jun 27 2020 at 17:53):

Morse 1,6 Connection of samples and experiments can we relate how human model decisions and beliefs in this same framework?
Simpson 3 Sheaves come up so much would like to learn more about applied to probability.
Patterson 2,3,4 10 The whole paper really. We are essentially data science experiments trying to navigate the world and update our mental models of it. Connecting the abstract categorical account with the pragmatics of dealing with data and a decision is very interesting to me to understand better.
Fritz 1,4 Much of how we judge events and decide on actions is relative to a context or evidence. Applying the categorical notions to conditional probability is relevant and might provide constraints on models of behavior.
Top Papers (1) Patterson (2) Fritz

Evan Patterson (Jun 27 2020 at 17:54):

McCullagh (5, 7): The problem of formalizing what it means for a statistical model to have a natural extension to a larger set of observational units is a good one. I'm interested in how this might be related to replicability problems in science, see e.g. this paper on the "generalizability crisis". I think that category theory should have something to say about how statistical models from different experiments are compatible with each other (or not).

Morse and Sacksteder (10, 11, 12): I'm curious about the "complete set of invariants" for statistical isomorphism. IIRC, Cencov's book has some material on complete sets of invariants. It would be good to understand how this all fits together.

Simpson (1, 2, 3): I like the goal of formalizing the idea that probabilistic concepts (unlike measure-theoretic ones) must be invariant under extension of the underlying probability space.

Patterson: [I am the author, so I won't say anything here.]

Perrone (10, 11, 12): I'm interested in better understanding how orders on a measurable space can be extended to metrics on the space of probability measures on that space.

Fritz: This paper is the most comprehensive treatment of Markov categories yet written. Ch. 3-4 of my thesis build on ideas from this and other papers on categories of Markov kernels and their abstraction as Markov categories. The paper has lots of good examples and results.

Top papers to start with: (1) McCullagh, (2) Perrone

Bradley Saul (Jun 27 2020 at 17:54):

McCullagh: 2, 4, 5, 9, 10
I'm interested in how to represent statistical models algebraically for computation. This paper seems like a natural place to start.

Morse: 2, 3
This idea of isomorphism between "statistical systems" seems like it could be useful to being able reason about relationship betwen various statistical approachs to scientific problems.

Simpson: 1
I know nothing of sheaves, but I'm open to learning about them.

Patterson: 1, 3, 4, 5, 10, 11
I am an applied statistician, scientist, and developer of statistical software, so this paper is the most directly relevant to me. I'm mostly interested in the algebra of statistical theories.

Perrone: 1
The heavy categorical language is all a bit beyond my curent categorical knowledge, but I'm game to hang on for dear life and learn.

Fritz: 1, 3, 13
I really like the idea of understanding key statistical concepts in categorical terms. I'm curious how this will shape statistics in the future.

Top Papers: (1) McCullagh, (2) Patterson

Ramshreyas Rao (Jun 27 2020 at 17:54):

For all the papers, my main interest is judging whether I can even follow them well enough to actually participate in this reading group.

Peter McCullagh
3

Norman Morse and Richard Sacksteder
3

Alex Simpson
1

Evan Patterson
2, 5, 7

Paolo Perrone
1

Tobias Fritz
1

Andrew Shulaev (Jun 27 2020 at 17:54):

McCullagh. 2, 3, 4, 8, 10. Defining the concepts in terms of category theory sounds interesting.

Morse. The abstract is not clear for me, so no input on that paper.

Simpson. 2, 3, 5. Presenting probability-theoretic notions in categorical fashions seems interesting.

Patterson. 2, 3, 7, 10. I am particularly interested in language-indepentent program representations and enriching those representations using ontologies.

Perrone. 1. The abstract is quite technical that I'm not able to parse, so no summary.

Fritz. 1, 3, 10, 11. Similarly to Simpson, presentation of probability in categorical terms sounds interesting.

Robin Taylor (he/him) (Jun 27 2020 at 17:55):

For me, hard to identify specific points for focus or by relevance without at least skim reading the papers, partly because the points are interlinked.

I last officially studied statistics some years ago, but was never entirely convinced by the way it was presented, which is why I'm here.

My two papers would be (1) Patterson and (2) Perrone.

Arthur Parzygnat (Jun 27 2020 at 17:57):

I'm posting this for a friend (I'll try to send him an invite if I have the possibility to do that):

Perrone
Fritz

Simone Fabbrizzi (Jun 27 2020 at 17:57):

McCullagh 4. 8.
As a wannabe data scientist, I would like to learn how to use the categorical language (which I found pretty natural) to make my work as precise as possible.

Morse & Stacksteder 9. 10.
Equivalence notions are useful in general

Simpson 4. 5.
Giry monad is used to define categorically Bayesian Networks, which are used in the field of Algorithmic Fairness to quantify bias

Patterson 2.
See McCullagh

Perrone 4. 5.
I'd like to learn something about monads

Fritz 1. 4.
Conditional independence is central in the inference of causal structures

(1) Simpson (2) Fritz

Peter Arndt (Jun 27 2020 at 17:57):

McCullagh: 2. 3. 4. 5. 6. 7. 10.
Statistics feels like a messy collection of practical recipes to me. Something conceptual like this might help me understand it better.

Morse/Sacksteder:
From the abstract I have no idea what this paper is about.

Simpson: 1.-5.
Sounds great, but I would rather just read it alone.

Patterson: 1.-5., 7.-11.
This sounds _awesome_!! The zoo of examples is great, the connection to software is great - it seems to be something that might make sense to the Machine Learning people in my department. Also great that we have the author with us.

Perrone:
Again, I am not sure what this is about.

Fritz: 1.-13.
I like the flexibility of this abstract approach. That makes it good to play with, and I already have an example I want to study (related to semantics of propositional logics)

Arthur Parzygnat (Jun 27 2020 at 17:58):

pmarriot says:

McCullagh
Fritz

Joshua Meyers (Jun 27 2020 at 17:58):

McCullagh 3 5 11
I am interested in how this relates to Bayesian Decision Theory, and
how it motivates priors.

Morse
I don't understand the motivation for the definitions.

Simpson 1 4 5
This seems pretty essential to understanding probability in general.

Patterson 2 3 5 11
This is not related to my work but it seems cool.

Perrone 1 4 5
The first couple chapters seem important but I don't know the
terminology to discern what the later chapters are about.

Fritz 1 3 4 5 6 10 11
I would like to know more about Markov stuff and conditioning.

Valeria de Paiva (Jun 27 2020 at 17:58):

For me:
McCullagh: 4, 5, 10, 12
Morse: 3, 9, 11
Simpson: 1, 3, 4
Patterson: 2, 7, 9, 10
Perrone: 2, 5, 11, 16
Fritz: 1, 7, 11
top choices to start with: (1) McCullagh, (2) Perrone

Michael Fishman (Jun 27 2020 at 17:59):

McCullagh - What is a Statistical Model?
2, 3, 9,10,11, 4, 5
A categorical definition of a statistical model seems like an good first step in studying categorical statistics

Morse
3, 12, 1
Sounds like flow models in unsupervised learning: https://www.youtube.com/watch?v=JBb5sSC0JoY&list=PLwRJQ4m4UJjPiJP3691u-qWwPGVKzSlNP&index=3

Simpson
1,2,3
Sheaves are cool.

Patterson
1,2,3,5,7,10,11
I'm curious how this relates to McCullagh's definition of a statistical model. I'm interested in the ontology.

Perrone
I don't understand most of these points, so can't mark the most interesting points to me.

Fritz
1, 7, 13
I'm interested in Markov Decision Processes, I'd like to use a categorical perspective. I am more familiar with probability than statistics.

win cuthbert (Jun 27 2020 at 17:59):

Patterson The Algebra & Machine Representation of Statistical Models
4, 6, 8, 9, 10, 11
Fritz A synthetic approach to Markov Kernels,
conditional independence and theorems on sufficient statistics
9-13

Arthur Parzygnat (Jun 27 2020 at 18:29):

If I counted correctly,
People's first/second/sum choices:
McCullagh 7/0/7
Morse 2/2/4
Simpson 3/0/3
Patterson 3/3/6
Perrone 1/3/4
Fritz 0/7/7

eric brunner (Jun 27 2020 at 18:32):

i've read evan's thesis, and intend to read both perrone & fritz, simpson, and mccullagh.

Oliver Shetler (Jun 27 2020 at 18:52):

Arthur Parzygnat said:

If I counted correctly,
People's first/second/sum choices:
McCullagh 7/0/7
Morse 2/2/4
Simpson 3/0/3
Patterson 3/3/6
Perrone 1/3/4
Fritz 0/7/7

Thanks!

Oliver Shetler (Jun 27 2020 at 18:57):

Thanks everybody for your feedback! I look forward to reading with you all and learning from each other.

Oliver Shetler (Jun 27 2020 at 20:25):

Also, if anybody has additional papers that are not on the list here:
https://wiki.functorialwiki.org/act/show/Statistics+reading+group

Please post your suggestions! I want this list to be an easy go-to a resource for people looking to get into categorical statistics and probability, so the more papers the better.