Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: practice: software

Topic: CQL

Julius Hamilton (Aug 13 2024 at 18:19):

I’m gonna try to understand CQL a bit better and we can have a thread for CQL here in general.

https://www.categoricaldata.net

https://arxiv.org/abs/1503.03571

https://www.categoricaldata.net/tutorial.html

I will expand on various thoughts I have had. I will most definitely go into sub-questions I have supporting my understanding of CQL.

Ryan said that calling CQL “a query language ended up being a misnomer; we found that what most query languages do is compute limits, whereas what CQL does uniquely is compute colimits”.

I’m still working on understanding limits, and this is my current understanding.

I think an important thing about a cone is that not only is it an object with a morphism to every object in a diagram, but it includes all the composition of morphisms. So if you have a cone $C$ from $X$ to (/over) a diagram $D$ , and a morphism $f: X \to d_1$ , for $d_1 \in D$ , and there is a morphism $\delta: d_1 \to d_2$ in $D$ , then the cone $C$ must also include a map $g: X \to d_2$ , such that $g = \delta \circ f$ .

I wonder if there’s a concise way to express this property of a cone - it “contains all possible compositions with morphisms in $D$ ”. I am thinking that a cone is like a multispan which has been “upgraded/promoted” to take into account the morphisms between the objects of the multispan. I wonder if there is a way to associate $g = \delta \circ f$ with a map from $f$ to $\delta$ - a map between maps.

Maybe there’s a good example in the category of sets to see how a multispan versus a cone appear to have a different character. In $\mathbf{Set}$ , I believe there is always at least one map between any pair of objects, except from any object to the initial object. Maybe we can think of some common functions on some common domains… $f(x) = x^2, x^3$ , for $\mathbb{N}, \mathbb{R}$ , etc. I haven’t thought about this before, I’ll come back to it.

A limit is a universal cone. I think the idea here is also subtle, I should do some examples with sets, maybe other categories too like $\mathbf{Top}$ . I think of it as, “For a given category, there are a number of possible diagrams. For a given diagram, there are a number of cones. For the collection of cones, there may or may not be one that is “universal” amongst them.”

I am currently thinking of anything “universal” as defined as, “for anything $X$ , the universal thing $Y$ is the one that they can all factor into”. It means that for all the morphisms from $X$ to the objects of diagram $D$ , every single one admits a factorization into a map $X \to L$ , then from $L \to X$ . (It’s interesting how when we are checking if a diagram constitutes a category, we have to check if the composition of all composable maps is included in the diagram. This is a way of “putting two morphisms together to create a third”. Whereas when we are considering if a given morphism admits a factorization, we are asking “can this morphism be broken into smaller ones?”. I wonder if there are interesting ways to “present the fundamental data of a category” going in either direction, perhaps - either just the most fundamental morphisms, implying the category is generated by their compositional closure - or, some collection of morphisms, with a particular property, where we are told they all factorize in a particular way.)

I am wondering if we can use limits in some way, like to “simplify” or “reduce” a category. With products, it seems like $A \times B$ has all the information about maps into $A$ and into $B$ . (But they have to be from the same source, which is interesting). Is there a way we can just get rid of objects $A$ and $B$ then, without information loss?

Anyway: to say a query language can compute limits, for a database… I’m trying to think of a simple example.

schema.jpg

It seems like here we can think of objects as sets of particular kinds of things / types, and morphisms as maps on those sets.

Maybe we can consider the diagram from “Department” to “String” via “name”.

I think there are four cones: the object “Dep” acts as the apex of a cone, since it maps to itself and to “String” via “name”, and all morphisms should commute.

The other cones are from apex “Emp”. We can choose either the first name or last name map into String, or both, for a cone. (I think).

The question is, amongst the cones, is one of them “universal”? I think the one with both name morphisms from “Emp” might be, but I’m not sure. It seems to suggest that for each department, we technically can represent that department by its secretary. So while for any department, we can determine its name; we can “expand the view” and it turns out that for any employee, we can figure out their department, and then that department’s name…

Is this related to the idea? Do limits, in categorical data, let us find a “most informative” point of view? Do we want that, so we can “prune away” the redundancies?

(And then, the question of colimits comes, later.)

Thanks!

Emilio Minichiello (Aug 13 2024 at 19:22):

I got a little lost in your question, maybe you could try and state it more succinctly in a sentence or two. Here's some of my thoughts on your thoughts. The best place to start learning about categorical database theory is David's paper. In that paper he put's forward what I'll call the unattributed model of categorical database theory. Nowadays we have a more nuanced model called the algebraic data model, which is explained beautifully in Algebraic Databases. This is the mathematical model that CQL actually runs on. But its a good deal more complex than the unattributed model, and I can already say several things about the ideas using that model, so let's stick with it.

Here's how limits and colimits come into database theory: In the unattributed model, a schema $U$ is a category. (If we want to be picky, we should really say its a finite category presentation, that way we can put it into a computer). An instance on $U$ is a functor $I : U \to \mathbf{Set}$ . So I have a whole category $U\mathbf{Inst} = \mathbf{Set}^U$ whose objects are functors and whose morphisms are natural transformations. So if I have a functor $F: V \to U$ , then I get a functor $\Delta_F : U\mathbf{Inst} \to V\mathbf{Inst}$ by precomposing with $F$ . This functor has left and right adjoints $\Sigma_F$ and $\Pi_F$ , and computing these is done by computing certain colimits and limits, respectively. In particular they are Kan extensions.

It turns out (by work of Spivak, Wisnesky) that you can implement typical SELECT FROM WHERE queries using these functors $\Delta, \Sigma, \Pi$ . So the moral is: categorical operations like limits and colimits (sometimes) coincide with useful database operations.

Emilio Minichiello (Aug 13 2024 at 19:26):

Now your idea there at the end, of cones inside the schema, is not what is considered in the papers above, but it is an interesting idea that is considered heavily in another model of categorical database theory, namely the EASIK or Sketch data model of Rosebrugh and Johnson. Here the idea is to model database constraints on a instance using (co)cones in the schema, and requiring the instance as a functor to turn these (co)cones into (co)limits of sets.

Emilio Minichiello (Aug 13 2024 at 19:36):

I'll also give some more details on what Ryan said about CQL computing colimits, but it requires some more knowledge of category theory. In the algebraic data model, schemas are modelled using (presentations of) certain kinds of algebraic theories. Instances are then (presentations of) models of algebraic theories, i.e. product-preserving functors to $\mathbf{Set}$ . Thus for a schema $U$ we again get a category $U\mathbf{Inst}$ , whose objects are instances and morphisms are natural transformations. It turns out that $U \mathbf{Inst}$ is an algebraic category, which while not being as nice as $\mathbf{Set}^U$ , it is still really quite nice, having all limits and colimits. Now without diving into too much more detail, limits turn out to be easy to compute in this case, but colimits are trickier. CQL excels at computing the colimits in these categories. The way it does it is by using theorem provers and term rewriting theory to iteratively glue together generators of the instances. This ends up being a really cool way to compute a classical database algorithm called the chase, more on that here.

Julius Hamilton (Aug 16 2024 at 23:59):

Thank you.

So $U\mathbf{Inst} = \mathbf{Set}^U$ is the category of functors from a schema $U$ to $\mathbf{Set}$ . This functor makes the database schema "concrete" by providing "actual data" for the schema. For example, it might map an object $\mathrm{People} \in Ob(U)$ to a set of people in $\mathbf{Set}$ .

For a functor $F : V \to U$ , from one schema to another, there is an induced (?) functor from $U$ 's category of functors into $\mathbf{Set}$ to $V$ 's category of functors into $\mathbf{Set}$ . It took me a second to get this, but now I do: since $U\mathbf{Inst}$ is the category of functors from $U$ to $\mathbf{Set}$ , then for every functor $X : U \to \mathbf{Set}$ , (pre)compose it with $F : V \to U$ . ("Pre"compose just means compose, but highlights that it is the next mentioned thing which comes first in the composition.) Now every functor in $U\mathbf{Inst}$ has become a functor $X \circ F : V \to \mathbf{Set}$ .

This induced functor $\Delta_F$ exists for any choice of $F: V \to U$ . My guess / to be proven: every functor $F$ determines a unique functor $\Delta_F$ .

I'll try to prove the statement, "This functor has left and right adjoints."

Julius Hamilton (Aug 17 2024 at 00:15):

In order to prove that $\Delta_F$ has a left adjoint, which is called $\Sigma_F$ , I need to show that for all $X_U \in U\mathbf{Inst}$ and $X_V \in V\mathbf{Inst}$ , $Hom(\Delta_F(X_U), X_V) \cong Hom(X_U, \Sigma_F(X_V))$ .

In order to show that, I need to define what functor $\Sigma_F$ does. It is the left Kan-extension of $\Delta_F$ . I'll study the definition of a left Kan extension.

Emilio Minichiello (Aug 17 2024 at 12:07):

Yes, exactly! Only thing is that $\Delta_F$ is right adjoint to $\Sigma_F$ so it has to appear on the right side of the Homs, as follows. Let $X \in U\mathbf{Inst}$ and $Y \in V\mathbf{Inst}$ , then we have

$V\mathbf{Inst}(\Sigma_F(X), Y) \cong U\mathbf{Inst}(X, \Delta_F(Y)).$

David Spivak's paper gives a really friendly introduction to these ideas, and especially Kan extensions, I'd highly recommend taking a look at it.

Emilio Minichiello (Aug 17 2024 at 12:10):

Actually I think this older version of that paper gives an even nicer introduction to these ideas, but note that he uses different notation for the left and right Kan extensions here.