You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Has anybody worked on Markov games categorically? As in how do agents compose in this setting?
I've done Markov games with open games successfully, although the theory mostly hasn't been worked out. I'll get back to you later or tomorrow
Continuing to be creative with the meaning of "later or tomorrow". This is going to be an absolute nightmare to explain on zulip. I could probably do it in about 20 minutes if we were standing in the same room and there was a whiteboard to sketch string diagrams...
If you know what open games are, I can easily write down the completed definition monolithically, and you can just take my word for it that it can be built compositionally. Consider a Markov game / jointly controlled MDP with state space , action spaces (), transition function , stage payoff functions , and geometric discount factor . Define an open game by: set of strategy profiles is ; play function is , ; coplay function , ; and an equilibrium function that I won't write down but it's basically argmax
Then you take the homset with globular 2-cells, restrict to the fibre with . That turns out to be a poset. You can define an endofunction on it given by pre-composition with (ie. "add another stage"). Its greatest fixpoint is the Markov game. (In particular there's some kind of metric coinduction going on, where the infinite composition of telescopes into the infinite sum )
The bigger picture is, this might not be the only way to do it, it might also be possible to go through a different formulation of MDPs like https://www.coalg.org/cmcs18/files/2018/04/Feys_Hansen_Moss.pdf , I'm not sure whether it can be adapted to multiple agents. The thing I said, I've never written it down in public except for some Haskell code that I used to numerically verify that it does value iteration correctly. I'm planning to use this for a big attack on multi-agent reinforcement learning
(forgot to mention, most of the times I wrote it was for random functions, ie. kleisli arrows of your favourite probability monad)
Thanks for all this! I also asked because I want to apply it to multi-agent reinforcement learning... namely what does it mean for agents to compose not compose in this settings particularly because a lot of the work uses mealy machines which are problematic to compose
For what it's worth I was thinking of a controls/robotics venue instead of something more theoretical.
(I will need to go through this slowly not sure I understand/have the prerequisites for everything you said but will get back to you)
We should arrange a call, if you actually want to understand this. What I wrote was optimised for not taking too long to write, and not ease of understanding
@Jules Hedges I mean that would be ideal if you have the time, check your private messages
@Jules Hedges @Giorgos Bakirtzis Is a Markov game something like "a game internal to the category where morphisms are markov kernels"?
Sort of, but the distinguishing property of a Markov game is that players have bounded memory, they can only observe the current state instead of the entire history of the play - that's the game theoretic version of the Markov property
It's pushing it to say that there's a general one-size-fits-all definition of "games over a category", but to the extent that there is, games over the category of Markov kernels describe extensive form games of imperfect information with Bayesian Nash equilibria. (There's some handwaving involved there)
Interesting. Thanks.