Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: applied category theory

Topic: Categorification of storages, backups and shardings?


view this post on Zulip Ignat Insarov (Oct 22 2024 at 09:02):

Say I run an e-commerce web site that sells gadgets. It is implemented as an HTTP server program X that can write stuff to a PostgreSQL database Y or to an ElasticSearch installation Z. Then:

Here, Y and Z are two backups of the same data. Here is another possible situation:

Whenever I see words and and or, I think there must be a category with sums and products in play. But in this case I cannot think of a way to formalize this intuition.

For one approach, we can say that objects are storages and arrows are connections, and we can ask that connections compose associatively. So, for example, X would have composite connection to Y₁ through Y. And maybe we can say that now there is a «virtual» storage (Y₁ + Y₂) × Z that X is connected to by virtue of being connected to Y and to Z.

One aspect this model does not account for is the difference between potentiality and actuality at a given time. Z can in principle hold any element of the set of gadgets, but Z is not the set of gadgets — at any given time it is some subset of that set. And X is either the empty set (when the HTTP server is idling) or a singleton set holding one gadget (when the HTTP server is working) at any given time, even though in potentiality it holds all possible gadgets.

Where can I go from here?

view this post on Zulip David Egolf (Oct 22 2024 at 17:23):

If I understand you correctly, you first note that we can form a category from "storages". We start as follows:

It's not totally clear to me what exactly a "connection" is. I have at least two questions about that: (1) Can we have a connection from AA to BB without a connection from BB to AA? (2) Can we have multiple distinct connections from AA to BB?

At any rate, we end up with a (directed multi-)graph of storages and connections, and from that we can freely generate a category.

view this post on Zulip David Egolf (Oct 22 2024 at 17:23):

However, you then note that this category only describes (to some level of detail) how information can flow around between our storages. It doesn't keep track of the particular state of any storage at some moment in time.

view this post on Zulip David Egolf (Oct 22 2024 at 17:32):

I don't know enough about this kind of thing to give specific suggestions. (Maybe someone else can chime in though?)

Here are some resources that might possibly be helpful to you, that talk about ways to model things that change over time using categories:

(I haven't read these yet - just bits and pieces of some of them. But I would like to read them properly someday!)

view this post on Zulip David Egolf (Oct 22 2024 at 17:38):

On a side note, I find it interesting that two kinds of information flow are present here: reading and writing. This reminds me of medical imaging, where a transducer can both transmit and receive. It seems like there ought to be a way to dream up a category where objects involve two-way interactions... but I'm not sure what morphisms should be! (What should a morphism be between two reading-writing systems? What should a morphism be between two transmitting-receiving systems?)

view this post on Zulip Ignat Insarov (Oct 22 2024 at 18:14):

Thank you David. These are great questions.

Say we have a logic. Then:

By analogy, we may say that:

Either way, connections are not generally symmetric. In some situations, like say a blockchain, we have multiple independent communicating agents that all have connection to one another. In other situations, like say an e-commerce installation, there is a hierarchy with few if any cycles.

One way we can think about this «duality» of reading and writing is by saying that there are two opposite categories — one where arrows are reading connections and another where arrows are writing connections. Maybe this should even be one single dagger category. But I am not ready to do any meaningful mathematical work in this direction yet.

None of this brings us closer to understanding transactions as they happen over time. The straightforward way would be to ask for a time-indexed sequence of categories defined as above — but we still need some secret sauce in order to be able to say that at time t₁ A knows about gadget g but B does not, and at time t₂ both A and B know about g. The things we may wish to prove here are like so:

Practically, in a non-trivial installation there will be many storages, some storages will store more than one type of information, and not all storages will be allowed to store information for time intervals longer than some value. For example:

I have no idea how to formalize any of this.

I read some of David Jaz Myers's book, and I read some of Brendan Fong and David Spivak's book. But their ideas do not live in me yet. I can draw composition of lenses if you wake me at 3 AM — I have been trying to apply this stuff for the last two years. But still it is not easy to explain how, for example, a non-trivial React web app is a composite lens. I have not seen John Baez's series of blog posts before — it looks interesting!

view this post on Zulip Julius Hamilton (Nov 14 2024 at 17:14):

Ok. I’m curious, why do Y1Y_1 and Y2Y_2 have identical schemas in this scenario - is it like for “data resilience”, in case one goes down or gets deleted?

In the type of categorical data science which I am learning, created by @Ryan Wisnesky and others, schemas are logical (“regular”) theories which have categories as their semantics (I believe), and an instance of a schema is a Set-valued functor on that category. (I am still learning, this may not be accurate).

Once you have a “database” (which is a schema and an instance), you can run queries to manipulate/transform/migrate that data. I am pretty sure that queries are actually natural transformations.

For one approach, we can say that objects are storages and arrows are connections, and we can ask that connections compose associatively. So, for example, X would have composite connection to Y₁ through Y. And maybe we can say that now there is a «virtual» storage (Y₁ + Y₂) × Z that X is connected to by virtue of being connected to Y and to Z.

Let’s say a “storage” is an instance. We can update and query these instances, with SQL commands like Insert and Select.

One thing your question reminds me of is a question of my own about “data synchronization”. I was also recently thinking about a scenario where I wanted updates to one instance to be automatically passed to other instances. In SQL, I think this is done with “triggers”. In Categorical Query Language, I don’t know if this is something that can be implemented or not.

In categorical query language, migrations are written using the transform keyword (I think). And I do believe transformations can be composed! Let’s hope Ryan gives us some input.

One aspect this model does not account for is the difference between potentiality and actuality at a given time. Z can in principle hold any element of the set of gadgets, but Z is not the set of gadgets — at any given time it is some subset of that set. 

I think categorical query language can handle this. First, you’ll define the schema of ZZ, which you said was “ElasticSearch”. In CQL, a schema is defined essentially using types, functions, and equations. I don’t know the structure of ElasticSearch, but we can work on it together if you like. CQL comes with many commonly used schemas already defined. For example, it already has the schema of Apache TinkerPop graph databases, so you can load an actual TinkerPop database into CQL and transform that data in various ways. If you define the schema of your e-commerce data, and of ElasticSearch, you can define “mappings” (CQL’s keyword for functors) which will transform data between the different schemas. Sometimes the mapping will be invertible, so you might be able to synchronize back and forth between ZZ and YY, I don’t know.

When you say “potentiality and actuality”, I think you are thinking about the difference between a schema, which would include a type like “Gadget”, and an instance, which would specify which gadgets some instance II has.

Where can I go from here?

Trying to model this in CQL could be a beneficial exercise. I’m happy to help.

view this post on Zulip Ryan Wisnesky (Nov 14 2024 at 19:04):

Remember that all models are wrong, but some models are useful - to determine how to formalize, one must know what one wants to do with the formalization. For example, this could be implemented, and proved correct, in Coq - even to the level of reasoning about HTTP protocols and file systems (see the paper 'certified web services in ynot' for example). Or you could formalize a series of updates using techniques from database theory, or CQL; there's several different ways "updates" in SQL can be formalized, even in the Spivak body of work. Sometimes if I don't know what I want to do with the formalization, I pick "run a simulation" as a default - could you implement a toy simulator of your system as a state machine, or Petri net, or python code, or ... ?