Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: applied category theory

Topic: Markov games

Georgios Bakirtzis (Nov 27 2020 at 16:58):

Has anybody worked on Markov games categorically? As in how do agents compose in this setting?

Jules Hedges (Nov 27 2020 at 17:04):

I've done Markov games with open games successfully, although the theory mostly hasn't been worked out. I'll get back to you later or tomorrow

Jules Hedges (Nov 30 2020 at 10:25):

Continuing to be creative with the meaning of "later or tomorrow". This is going to be an absolute nightmare to explain on zulip. I could probably do it in about 20 minutes if we were standing in the same room and there was a whiteboard to sketch string diagrams...

Jules Hedges (Nov 30 2020 at 10:32):

If you know what open games are, I can easily write down the completed definition monolithically, and you can just take my word for it that it can be built compositionally. Consider a Markov game / jointly controlled MDP with state space $S$ , action spaces $A_i$ ( $i \leq N$ ), transition function $q : S \times \prod_i A_i \to S$ , stage payoff functions $U_i : S \times \prod_i A_i \to \mathbb R$ , and geometric discount factor $0 < \beta < 1$ . Define an open game $\mathcal G : (S, \mathbb R^N) \to (S, \mathbb R^N)$ by: set of strategy profiles is $\Sigma_\mathcal G = \prod_i (S \to A_i)$ ; play function is $P_\mathcal G : \Sigma \times S \to S$ , $((\sigma_i), s) \mapsto q (s, (\sigma_i s))$ ; coplay function $C_\mathcal G : \Sigma \times S \times \mathbb R^N \to \mathbb R^n$ , $((\sigma_i), s, (x_i)) \mapsto (U_i (s, (\sigma_j s)) + \beta x_i)$ ; and an equilibrium function that I won't write down but it's basically argmax

Jules Hedges (Nov 30 2020 at 10:36):

Then you take the homset $(X, S) \to (X, S)$ with globular 2-cells, restrict to the fibre with $\Sigma = \prod_i (S \to A_i)$ . That turns out to be a poset. You can define an endofunction on it given by pre-composition with $\mathcal G$ (ie. "add another stage"). Its greatest fixpoint is the Markov game. (In particular there's some kind of metric coinduction going on, where the infinite composition of $C_\mathcal G$ telescopes into the infinite sum $\sum_k \beta^k (U_i \cdots)$ )

Jules Hedges (Nov 30 2020 at 10:40):

The bigger picture is, this might not be the only way to do it, it might also be possible to go through a different formulation of MDPs like https://www.coalg.org/cmcs18/files/2018/04/Feys_Hansen_Moss.pdf , I'm not sure whether it can be adapted to multiple agents. The thing I said, I've never written it down in public except for some Haskell code that I used to numerically verify that it does value iteration correctly. I'm planning to use this for a big attack on multi-agent reinforcement learning

Jules Hedges (Nov 30 2020 at 11:31):

(forgot to mention, most of the times I wrote $\to$ it was for random functions, ie. kleisli arrows of your favourite probability monad)

Georgios Bakirtzis (Nov 30 2020 at 15:45):

Thanks for all this! I also asked because I want to apply it to multi-agent reinforcement learning... namely what does it mean for agents to compose not compose in this settings particularly because a lot of the work uses mealy machines which are problematic to compose

Georgios Bakirtzis (Nov 30 2020 at 15:45):

For what it's worth I was thinking of a controls/robotics venue instead of something more theoretical.

Georgios Bakirtzis (Nov 30 2020 at 15:45):

(I will need to go through this slowly not sure I understand/have the prerequisites for everything you said but will get back to you)

Jules Hedges (Nov 30 2020 at 15:49):

We should arrange a call, if you actually want to understand this. What I wrote was optimised for not taking too long to write, and not ease of understanding

Georgios Bakirtzis (Nov 30 2020 at 15:50):

@Jules Hedges I mean that would be ideal if you have the time, check your private messages

Jade Master (Nov 30 2020 at 18:11):

@Jules Hedges @Giorgos Bakirtzis Is a Markov game something like "a game internal to the category where morphisms are markov kernels"?

Jules Hedges (Nov 30 2020 at 19:23):

Sort of, but the distinguishing property of a Markov game is that players have bounded memory, they can only observe the current state instead of the entire history of the play - that's the game theoretic version of the Markov property

Jules Hedges (Nov 30 2020 at 19:25):

It's pushing it to say that there's a general one-size-fits-all definition of "games over a category", but to the extent that there is, games over the category of Markov kernels describe extensive form games of imperfect information with Bayesian Nash equilibria. (There's some handwaving involved there)

Jade Master (Nov 30 2020 at 22:12):

Interesting. Thanks.