Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: applied category theory

Topic: AI Alignment from a Category-Theoretic Point of View

Moby-Dick (Mar 05 2026 at 15:07):

Suppose that, somehow, we are able to impose a set of ethical rules on an intelligent system. If the system can modify its own code, it could simply delete the ethical constraints. Hence, the AI alignment problem is related to the ability of an intelligent system to modify its own code. This seems to be a necessary condition, but maybe it is not sufficient.

Patrik Eklund developed the so-called lative logic to prevent a system from rewriting its own rules once they have been introduced. This logic heavily relies on category theory. I would like to share some slides showing Patrik’s work. Feel free to to share your opinions about whether that could be useful for solving the AI alignment problem. The Fundamentals of Lative Logic

Moby-Dick (Mar 05 2026 at 15:09):

Additionally, if you know of other authors who use category theory to approach the AI alignment problem, feel free to share references to their work.

John Baez (Mar 05 2026 at 17:46):

I hope you know about the Safeguarded AI project, a $\pounds$ 59 million UK project to use category theory to make safer AI:

It's not exactly about "AI alignment": instead, it's about developing systems where you can tell AI exactly what you want to do, and check that it's doing exactly that.

Still, most people working on AI safety and category theory are involved in this project.

Jonty Male (Mar 05 2026 at 18:38):

There is also agent foundations which focuses on a longer-timeframe view, trying to get the right definitions of things like “agents” and “concepts”, using a lot of tools from ACT increasingly.

Lines are blurred as to what is agent foundations vs decision theory vs control theory etc but some papers being discussed at the current conference include https://arxiv.org/pdf/2503.00511, as well as increasing use in natural abstractions/latents and infra-bayesianism.