Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: deprecated: mathematics

Topic: evolutionary game

Peiyuan Zhu (Nov 07 2022 at 20:32):

image.png
I was reading Marc Harper's paper: inference as replicator dynamics. I don't understand why the second equality hold. It suggest $\sum_{i=1}^nx_if_i(x)=\frac{1}{n}\sum_{i=1}^nf_i(x)$ because for instance $0.3\times 4+0.7\times 2=1.2+1.4=2.6\ne\frac{1}{2}(4+2)=3$ . What am I missing? The original paper is here https://arxiv-benchmark.informatik.uni-freiburg.de/data/benchmark/pdf/0911/0911.1763.pdf

Peiyuan Zhu (Nov 07 2022 at 20:35):

Ok never mind I found the answer in the paper. Average fitness is defined as $\bar{f}(x)=\sum_{i=1}^nx_if_i(x)$ . Interesting.

John Baez (Nov 07 2022 at 21:19):

Yeah, if 99 people have fitness 0 and 1 has fitness 1 the average fitness is 1/100, not 50/100.

John Baez (Nov 07 2022 at 21:20):

By the way, this is not "John Baez's paper with Marc Harper" - you can see the author is Marc Harper.

Peiyuan Zhu (Nov 07 2022 at 21:58):

Oh my bad :sweat_smile:

Peiyuan Zhu (Nov 07 2022 at 22:19):

What is the rationale behind this? It sounds reasonable to me, but I failed to explain it myself

John Baez (Nov 07 2022 at 22:22):

So if you had 100 friends, and 99 of them earned $0/year, and 1 of them earned $1,000,000/year, what you say their average income was?

Peiyuan Zhu (Nov 07 2022 at 22:40):

I see, it has to be weighted by population, not by categories. So $10,000.

John Baez (Nov 07 2022 at 22:46):

Right. And if 99 people have fitness 0 and 1 has fitness 1 the average fitness is 1/100. This is because there are 100 people and they only have one child who survives.

Peiyuan Zhu (Nov 28 2022 at 18:21):

@John Baez I'm thinking of extending this replicator dynamics to a bigger space e.g. the probability assignment over space of possible events $2^\Theta$ instead of only the atomic events $\{\{\theta_1\},\{\theta_2\},\cdots,\{\theta_n\}\}$ , while relaxing some basic assumptions of probability theory e.g. law of excluded middle $0 \le P(\{\theta_{i_1},\theta_{i_2},\cdots,\theta_{i_k}\})+P(\{\theta_{i_1},\theta_{i_2},\cdots,\theta_{i_k}\}^c) \le 1$ instead of $0 \le ... =1$ for all $\{i_1,\cdots,i_k\} \subset [n]$ as it is usually assumed in probability theory to account for underparametrization. In this way the Bayesian dynamics / geometry are limiting case of this dynamics / geometry in a bigger space. Is there any references (maybe in geometry or dynamical systems?) that you would recommend to look into this problem? I think this is a problem that can be part of your initiative of uncertainty assessment for climate.

Peiyuan Zhu (Nov 28 2022 at 18:34):

The problem seems more "topological" than it is "geometrical". Doesn't look like classical differential geometry?

Peiyuan Zhu (Nov 28 2022 at 18:36):

I saw several links on piecewise linear manifold, not sure if they're relevant. A question can be what kind of piecewise linear manifold can recover Riemannian metric as a special / limiting case? It looks like a method used in quantum gravity a lot.

John Baez (Nov 30 2022 at 15:59):

Sorry, I have no advice for you: your thoughts are too fragmentary for me to grasp them.

Peiyuan Zhu (Nov 30 2022 at 17:14):

The question is how to generalize inference dynamics & geometry without the assumption to only work on probabilities of singleton events and assumption of probability theory $P(A)+P(A^c)=1$ as it is in Marc Harper's paper.

Peiyuan Zhu (Nov 30 2022 at 17:17):

As an example for hypothesis space $\Theta=\{\theta_1,\theta_2,\theta_3\}$ we're interested in studying the dynamics on $\mathcal{P}(\Theta)=\{\{\theta_1\},\{\theta_2\},\{\theta_3\},\{\theta_1,\theta_2\},\{\theta_2,\theta_3\},\{\theta_1,\theta_3\},\{\theta_1,\theta_2,\theta_3\}\}$ but Marc Harper only studied dynamics on atomic events $\{\{\theta_1\},\{\theta_2\},\{\theta_3\}\}$ . If we assume law of excluded middle $P(A)+P(A^c)=1$ for $A\subset\Theta$ then the higher-order events doesn't really need to be considered as their probabilities can be deduced. The question is what if we take out this assumption and see what happens geometrically.

John Baez (Nov 30 2022 at 17:26):

Okay. But that's not really a math question. Generalizing information geometry to some weird version of probability theory where we drop the assumption $P(A) + P(A^c) = 1$ is an open-ended research project, not a "question".

Peiyuan Zhu (Nov 30 2022 at 17:29):

I would like to know if there are existing tools in mathematics (e.g. topology, category theory, etc.) that can handle this open-ended question. Dempster-Shafer theory is a generalization to Bayesian inference so I think the since Bayesian inference dynamics has good interpretation geometrically, so does Dempster-Shafer theory.

John Baez (Nov 30 2022 at 17:29):

If you don't mind some advice: I think it would be very good for you to learn how to ask the usual sort of math question, where you either ask if some clearly well-formed statement is true, or ask how to prove some such statement. For example, "are all functions in $L^2[0,1]$ also in $L^1[0,1]$ ?", where the answer is "yes".

Peiyuan Zhu (Nov 30 2022 at 17:31):

Hmm I think the question here is I'm trying to find in a math framework if any exists; if not invent one. Maybe the first step is to define it more precisely?

John Baez (Nov 30 2022 at 17:32):

Yes. I have no idea what you're trying to do, so I can't help you do it. And to be very honest, I'm actually afraid you don't know what you're trying to do, either.

John Baez (Nov 30 2022 at 17:34):

It might help if I knew Dempster-Shafer theory - maybe then I could guess what you're trying to do. But I don't know that stuff.

Peiyuan Zhu (Nov 30 2022 at 17:36):

What criterion is "knowing what one's trying to do" satisfied? I want to know if stability of the inference dynamics of a well-known generalization of Bayesian inference can be studied in differential geometry, like it is studied in Bayesian inference by Marc Harper. But I think your advice mean to write up a formalism that embodies both Bayesian inference and Dempster-Shafer inference so it becomes a pure math problem.

Peiyuan Zhu (Nov 30 2022 at 17:40):

I get the sense that knowing what to do means you already have an answer to a question. But then I guess you don't need to ask any questions.

John Baez (Nov 30 2022 at 18:09):

Maybe someone who was an expert on Dempster-Shafer theory and information geometry could take a vague question like

I want to know if stability of the inference dynamics of a well-known generalization of Bayesian inference can be studied in differential geometry, like it is studied in Bayesian inference by Marc Harper.

and tell you something interesting about it. But I can't.

John Baez (Nov 30 2022 at 18:11):

You often seem to ask very open-ended questions. They make me feel you're asking someone else to do your research for you.

John Baez (Nov 30 2022 at 18:11):

Mathematicians can usually help more when you ask more precise questions - the sort of question you're able to ask when you have a specific plan, and you need to know if some particular statement is true.

David Egolf (Nov 30 2022 at 18:20):

John Baez said:

Mathematicians can usually help more when you ask more precise questions - the sort of question you're able to ask when you have a specific plan, and you need to know if some particular statement is true.

I believe this! But this mode of interaction with mathematicians does require figuring out helpful precise questions to ask. And I think learning how to ask such questions takes a lot of time, experience, and work. I wish I knew how to do this better myself.

For what it's worth, @Peiyuan Zhu, my theory is that learning how to ask precise questions (and develop a precise research plan) can be accomplished by (1) getting a solid basic foundation in the area you want to study (doing lots of exercises, and asking questions about those) and (2) reading and understanding in detail papers that people have written in related areas (and talking to people about these). I am still working on doing this myself, but my hope is that once (1) and (2) are accomplished it becomes easier to ask questions that are both interesting and sufficiently precise. Maybe such questions can be generated by modifying questions already asked and answered in previously published papers, at least to start with.

I'm sure many people here can offer a more insightful perspective on this process than me, though.

Morgan Rogers (he/him) (Dec 01 2022 at 14:33):

John Baez said:

Maybe someone who was an expert on Dempster-Shafer theory and information geometry could take a vague question like

I want to know if stability of the inference dynamics of a well-known generalization of Bayesian inference can be studied in differential geometry, like it is studied in Bayesian inference by Marc Harper.

and tell you something interesting about it. But I can't.

@Peiyuan Zhu a viable method is to distill the essential details of the thing you want to ask about into a summary that gives others the context needed to understand your question. For instance, rather than pasting several pages of a book or saying a name like "Dempster-Shafer theory" (which I also do not know the content of), give a paragraph summary of what you have understood or a specific example and point to the thing you don't understand. The longer the summary, the smaller the chance of engagement, but it will at least significantly lower the effort required by someone trying to engage with your question.

For this specific topic, you only have a small chance of getting a satisfying answer: either someone has tried it somewhere and someone reading this topic has seen that work and can point you to it, or no one here knows (which is likely: Dempster-Shafer theory isn't directly categorical, and is deep enough into probability theory that even the categorical probability people here may not have seen it) in which case you'll just have to try it for yourself and find out or ask somewhere else. This is a space for discussing category theory, we don't know everything!

Peiyuan Zhu (Dec 01 2022 at 17:16):

Morgan Rogers (he/him) said:

John Baez said:

Maybe someone who was an expert on Dempster-Shafer theory and information geometry could take a vague question like

I want to know if stability of the inference dynamics of a well-known generalization of Bayesian inference can be studied in differential geometry, like it is studied in Bayesian inference by Marc Harper.

and tell you something interesting about it. But I can't.

Peiyuan Zhu a viable method is to distill the essential details of the thing you want to ask about into a summary that gives others the context needed to understand your question. For instance, rather than pasting several pages of a book or saying a name like "Dempster-Shafer theory" (which I also do not know the content of), give a paragraph summary of what you have understood or a specific example and point to the thing you don't understand. The longer the summary, the smaller the chance of engagement, but it will at least significantly lower the effort required by someone trying to engage with your question.

For this specific topic, you only have a small chance of getting a satisfying answer: either someone has tried it somewhere and someone reading this topic has seen that work and can point you to it, or no one here knows (which is likely: Dempster-Shafer theory isn't directly categorical, and is deep enough into probability theory that even the categorical probability people here may not have seen it) in which case you'll just have to try it for yourself and find out or ask somewhere else. This is a space for discussing category theory, we don't know everything!

I like the comment "Dempster-Shafer theory isn't directly categorical" because I can see that way it is used has deep categorical strucutres but it isn't immediate to see how it can be categorified -- it says that some modification of the theory is needed or a different perspective is needed.

Peiyuan Zhu (Dec 01 2022 at 17:21):

I typesetted a research proposal explaining this in more details. Would it be suitable to post it here? Shall I move this to #practice: our work #practice: our papers channels?

Jean-Baptiste Vienney (Dec 02 2022 at 11:37):

@Peiyuan Zhu do you know an advisor / professor that know you better and could give you advices depending of your specific situation? If you want to do research on this subject, you have to take into account which person you can do it with etc... There are aspects which are not strictly mathematical and we don't have all the information to help you with that.

Peiyuan Zhu (Dec 03 2022 at 01:47):

Been looking for some people to critique on this recently. Just hear back from several of them. Ready to meet with some of them next week.

Peiyuan Zhu (Feb 18 2023 at 01:00):

I'm trying this evolutionary dynamical system on a simple coin tossing model to make sure I understand the concepts.

Peiyuan Zhu (Feb 18 2023 at 01:00):

A coin tossing model $\theta_0$ is a fair coin, $\theta_1$ only has head $H$ .

$\mathcal{L}(H,\theta_0)=0.5$
$\mathcal{L}(T,\theta_0)=0.5$
$\mathcal{L}(H,\theta_1)=1$
$\mathcal{L}(T,\theta_1)=0$

Suppose we have prior $p(\theta_0)=0.2,p(\theta_1)=0.8$ .
Suppose we observe head $H$ .

Peiyuan Zhu (Feb 18 2023 at 01:02):

Marc Harper's paper https://arxiv.org/pdf/0911.1763.pdf suggests that we can analyze such inference problem with solving a replicator dynamic as follows:

Replicator equation $\frac{dp_i}{dt}=p_i\cdot\left(\mathcal{L}(x_1,\theta_i)-\sum_{k=0}^1\left(p_k\cdot\mathcal{L}(x_1,\theta_k)\right)\right)$ becomes
$\frac{dp_0}{dt}=p_0\cdot\left(0.5-p_0\cdot 0.5-p_1\cdot 1\right)$
$\frac{dp_1}{dt}=p_1\cdot\left(1-p_0\cdot 0.5-p_1\cdot 1\right)$
with constraints
$0\le p_0 \le 1$
$0\le p_1 \le 1$
$p_0+p_1=1$
and initial conditions
$p_0(0)=0.2$
$p_1(0)=0.8$

Peiyuan Zhu (Feb 18 2023 at 20:44):

With this evolutionary dynamics, I investigated two questions according to the paper.

Peiyuan Zhu (Feb 18 2023 at 20:44):

First question: Is the posterior a fixed point?

$P(\theta=\theta_0|x_1=H)=\frac{P(x_1=H|\theta_0)P(\theta_0)}{P(x_1=H)}=\frac{0.5\cdot 0.2}{Z}=\frac{1}{9}=0.11$
$P(\theta=\theta_1|x_1=H)=\frac{P(x_1=H|\theta_1)P(\theta_1)}{P(x_1=H)}=\frac{1\cdot 0.8}{Z}=\frac{8}{9}=0.88$
Plug into the replicator equation
$\frac{dp_0}{dt}=1/9*(0.5-0.5*1/9-8/9)=-4/81$ decreasing
$\frac{dp_1}{dt}=8/9*(1-0.5*1/9-8/9)=4/81$ increasing

Answer: No

Peiyuan Zhu (Feb 18 2023 at 20:45):

Second question: Does posterior minimize KL-divergence near the fixed point?

Substitute $p_1=1-p_0$
$\frac{dp_0}{dt}=p_0\cdot\left(0.5-p_0\cdot 0.5-(1-p_0)\cdot 1\right)$
$=p_0\cdot(0.5-p_0\cdot0.5-1+p_0)$
$=0.5\cdot p_0\cdot(p_0-1)$
Fixed points are $p^\star=\{(0,1),(1,0)\}$
$(0,1)$ is stable, so it's an ESS
$(1,0)$ is unstable
Can't even minimize $D_{KL}(p||p^\star)=\sum_{i=0}^1p_i\log\left(\frac{p_i}{p_i^\star}\right)$ because ESS has zero probabilities

Answer: No

Peiyuan Zhu (Feb 18 2023 at 20:46):

However, in my reading of the paper, at least one of the above two questions should give answer "yes". What am I missing in my understanding of the paper?

Morgan Rogers (he/him) (Feb 18 2023 at 21:57):

Did you post this on Stack Exchange? I suspect there are more people who would be able to answer your question there (although I appreciate that you have taken some of the earlier advice on asking questions on board!)

Peiyuan Zhu (Feb 18 2023 at 22:19):

Do you mean math stack-exchange https://math.stackexchange.com?

Peiyuan Zhu (Feb 18 2023 at 22:23):

I just made posts here https://mathoverflow.net/questions/441129/bayesian-inference-as-replicator-dynamics and here https://math.stackexchange.com/questions/4641999/bayesian-inference-as-replicator-dynamics but nobody has replied so far.

Jean-Baptiste Vienney (Feb 18 2023 at 22:24):

Morgan Rogers (he/him) said:

Did you post this on Stack Exchange? I suspect there are more people who would be able to answer your question there (although I appreciate that you have taken some of the earlier advice on asking questions on board!)

I agree with Morgan, your question is much clearer than the previous ones!

Peiyuan Zhu (Feb 19 2023 at 21:18):

Third Question: Is there a more general formula than the one calculated above?

$\frac{dp_0}{dt}=p_0\cdot\left(l_1-p_0\cdot l_1-(1-p_0)\cdot l_2\right)$
$=p_0\cdot\left(l_1-p_0\cdot l_1-l_2+p_0\cdot l_2\right)$
$=p_0\cdot(l_1-l_2)\cdot(1-p_0)$

Answer: Yes, this is a logistic equation

If $l_0-l_1>0$ , $(1,0)$ is stable
If $l_0-l_1<0$ , $(0,1)$ is stable
If $l_0-l_1=0$ , $\{(x,1-x):x\in[0,1]\}$ are stable

Notification Bot (Feb 21 2023 at 00:14):

This topic was moved here from #learning: questions > evolutionary game by Matteo Capucci (he/him).

Peiyuan Zhu (Feb 21 2023 at 01:21):

Some potential problems

[1] The result only hold for higher dimensional dynamics.

[2] The result only holds for discrete replicator dynamics.

Peiyuan Zhu (Feb 21 2023 at 21:54):

Peiyuan Zhu said:

Some potential problems

[1] The result only hold for higher dimensional dynamics.

[2] The result only holds for discrete replicator dynamics.

Response to potential problems:

[1] The high dimensional replicator dynamics would experience the same problem of degenerate solution.

[2] The KL-divergence result is indeed for continuous time replicator dynamics.

Peiyuan Zhu (Feb 24 2023 at 00:21):

For $n=3$

$p_{\theta_0}(H)=\frac{2}{3}$
$p_{\theta_0}(T)=\frac{1}{3}$
$p_{\theta_1}(H)=1$
$p_{\theta_1}(T)=0$
$p_{\theta_2}(H)=0$
$p_{\theta_2}(T)=1$
Observe $H$
Substitute $p_2=1-p_1-p_0$
$\frac{dp_0}{dt}=p_0\left(\frac{2}{3}-p_0\frac{2}{3}-p_1\right)$
$\frac{dp_1}{dt}=p_1\left(1-p_0\frac{2}{3}-p_1\right)$
$\frac{dp_2}{dt}=p_2\left(0-p_0\frac{2}{3}-p_1\right)$
There are three parallel lines that divides the simplex
We know that $\frac{dp_0}{dt}>\frac{dp_1}{dt}>\frac{dp_2}{dt}$
So there are four situations overall
- $p_0,p_1,p_2$ all decaying
- $p_0$ growing but $p_1,p_2$ decaying
- $p_0,p_1$ growing but $p_2$ decaying
- $p_0,p_1,p_2$ all growing

There isn't a rest point on the simplex.

Peiyuan Zhu (Feb 24 2023 at 00:28):

And maybe it's because I didn't understand this proof.
image.png

I think the definition of Lyapunov function here is that a function is decreasing near an equilibrium point. The replicator equation is substituted in etc. But it doesn't say anything about the situation that equilibrium doesn't exist is that it?

John Baez (Feb 24 2023 at 00:49):

If there's no rest point there's no ESS (evolutionarily stable state) so the theorem implies that for no point is the Kullback-Leibler divergence a local Lyapunov function.

Peiyuan Zhu (Feb 24 2023 at 00:57):

So when the paper says " The replicator equation can now be understood as modeling the informational dynamics of the popula- tion distribution, moving in the direction of maximal local increase of potential with respect to the Fisher information, and ultimately con- verging to a minimal potential information state if a stablizing state (ESS) exists in the interior of the state space." it means that only if an ESS exists, but if normally ESS doesn't exist for Bayesian inference, this paper isn't fair to its title by saying "replicator equation as an inference dynamics". Am I correct?

John Baez (Feb 24 2023 at 00:59):

That question is too vague and subjective to answer. Focus on the theorems, not whether it's "fair" to title the paper a certain way.

Peiyuan Zhu (Feb 24 2023 at 01:05):

By "fair" I mean the solution of the replicator is one-to-one to Bayesian posterior. So can I understand the above sentence from the paper as "the rest point of the replicator is the Bayesian posterior obtained by minimizing Lyapunov function if and only if the rest point is an ESS"? So if I want to verify this statement with a numerical example, I would need to find an inference problem that has an ESS first. The paper didn't say anything about when the inference problem has an ESS, so I can only try very arbitrary fitness functions by myself, am I correct?

John Baez (Feb 24 2023 at 01:08):

The theorem says exactly what it says: it says that a state $\hat{x}$ is an interior ESS for the replicator equation if and only iff $D_{\mathrm{KL}}(\hat{x}|x)$ is a local Lyapunov function.

John Baez (Feb 24 2023 at 01:12):

If you want a fun example of this theorem, pick a replicator equation that has an interior ESS.

Peiyuan Zhu (Feb 24 2023 at 01:25):

I see. Now it makes sense. I tried two examples already but ESS doesn't exist in either case. In 2d interior ESS for sure doesn't exists. Now there are quite a lot of choices for 3d. But the previous example I tried above doesn't seem to have an interior ESS either. There are so many cases that ESS doesn't exists. I'll keep trying. At least from the above example I know that the replicator divides the simplex into 2 by 2 by 2 equals 8 possible regions with varying signs of derivative. The case when an interior ESS should be exactly the case when three lines cross at one point on the simplex. This is a extremely small fraction out of all legit inference problems.

Peiyuan Zhu (Feb 24 2023 at 01:50):

It’d be interesting to see what are the inference conditions that correspond to existence of ESS.

Peiyuan Zhu (Mar 13 2023 at 23:27):

Ok, it looks like I still couldn't find interior ESS

Peiyuan Zhu (Mar 13 2023 at 23:27):

Coin tossing model
Observe H
Two possible coins, p(H|xi)=a,b
dx1/dt=x1(a-ax1-b*x2)
dx2/dt=x2(b-ax1-b*x2)
There’s no interior ESS
Three possible coins, p(H|xi)=a,b,c
dx1/dt=x1(a-ax1-bx2-cx3)
dx2/dt=x2(b-ax1-bx2-cx3)
dx3/dt=x3(c-ax1-bx2-cx3)
There’s no interior ESS

Peiyuan Zhu (Apr 04 2023 at 07:22):

2E58DC6B-90D3-4D25-A556-A7EAE7C6B729.png

Peiyuan Zhu (Apr 04 2023 at 07:22):

So I’m still having trouble seeing this analogy.

Peiyuan Zhu (Apr 04 2023 at 07:23):

New evidence doesn’t depend on prior probability at all, but in fitness landscape it does seem to depend on population state. The analogy doesn’t hold.

Peiyuan Zhu (Apr 04 2023 at 07:29):

“Bayesian inference is a special case, formally, of the discrete replicator dynamic, since the fitness landscape in each coordinate may depend on the en- tire population distribution rather than only on the proportion of the i- type” The fitness landscape in Bayesian inference doesn’t seem to even depend on proportion of the i-type.

Peiyuan Zhu (Apr 04 2023 at 07:32):

Unless the prior itself are the parameters, which isn’t the standard Bayesian setting that he laid out

Peiyuan Zhu (Apr 04 2023 at 07:32):

P(E|Hi) doesn’t depend on P(Hi)