You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Suppose that, somehow, we are able to impose a set of ethical rules on an intelligent system. If the system can modify its own code, it could simply delete the ethical constraints. Hence, the AI alignment problem is related to the ability of an intelligent system to modify its own code. This seems to be a necessary condition, but maybe it is not sufficient.
Patrik Eklund developed the so-called lative logic to prevent a system from rewriting its own rules once they have been introduced. This logic heavily relies on category theory. I would like to share some slides showing Patrik’s work. Feel free to to share your opinions about whether that could be useful for solving the AI alignment problem. The Fundamentals of Lative Logic
Additionally, if you know of other authors who use category theory to approach the AI alignment problem, feel free to share references to their work.
I hope you know about the Safeguarded AI project, a 59 million UK project to use category theory to make safer AI:
It's not exactly about "AI alignment": instead, it's about developing systems where you can tell AI exactly what you want to do, and check that it's doing exactly that.
Still, most people working on AI safety and category theory are involved in this project.
There is also agent foundations which focuses on a longer-timeframe view, trying to get the right definitions of things like “agents” and “concepts”, using a lot of tools from ACT increasingly.
Lines are blurred as to what is agent foundations vs decision theory vs control theory etc but some papers being discussed at the current conference include https://arxiv.org/pdf/2503.00511, as well as increasing use in natural abstractions/latents and infra-bayesianism.