Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: practice: software

Topic: CQL


view this post on Zulip Julius Hamilton (Aug 13 2024 at 18:19):

I’m gonna try to understand CQL a bit better and we can have a thread for CQL here in general.

https://www.categoricaldata.net

https://arxiv.org/abs/1503.03571

https://www.categoricaldata.net/tutorial.html

I will expand on various thoughts I have had. I will most definitely go into sub-questions I have supporting my understanding of CQL.

Ryan said that calling CQL “a query language ended up being a misnomer; we found that what most query languages do is compute limits, whereas what CQL does uniquely is compute colimits”.

I’m still working on understanding limits, and this is my current understanding.

I think an important thing about a cone is that not only is it an object with a morphism to every object in a diagram, but it includes all the composition of morphisms. So if you have a cone CC from XX to (/over) a diagram DD, and a morphism f:Xd1f: X \to d_1, for d1Dd_1 \in D, and there is a morphism δ:d1d2\delta: d_1 \to d_2 in DD, then the cone CC must also include a map g:Xd2g: X \to d_2, such that g=δfg = \delta \circ f.

I wonder if there’s a concise way to express this property of a cone - it “contains all possible compositions with morphisms in DD”. I am thinking that a cone is like a multispan which has been “upgraded/promoted” to take into account the morphisms between the objects of the multispan. I wonder if there is a way to associate g=δfg = \delta \circ f with a map from ff to δ\delta - a map between maps.

Maybe there’s a good example in the category of sets to see how a multispan versus a cone appear to have a different character. In Set\mathbf{Set}, I believe there is always at least one map between any pair of objects, except from any object to the initial object. Maybe we can think of some common functions on some common domains… f(x)=x2,x3f(x) = x^2, x^3, for N,R\mathbb{N}, \mathbb{R}, etc. I haven’t thought about this before, I’ll come back to it.

A limit is a universal cone. I think the idea here is also subtle, I should do some examples with sets, maybe other categories too like Top\mathbf{Top}. I think of it as, “For a given category, there are a number of possible diagrams. For a given diagram, there are a number of cones. For the collection of cones, there may or may not be one that is “universal” amongst them.”

I am currently thinking of anything “universal” as defined as, “for anything XX, the universal thing YY is the one that they can all factor into”. It means that for all the morphisms from XX to the objects of diagram DD, every single one admits a factorization into a map XLX \to L, then from LXL \to X. (It’s interesting how when we are checking if a diagram constitutes a category, we have to check if the composition of all composable maps is included in the diagram. This is a way of “putting two morphisms together to create a third”. Whereas when we are considering if a given morphism admits a factorization, we are asking “can this morphism be broken into smaller ones?”. I wonder if there are interesting ways to “present the fundamental data of a category” going in either direction, perhaps - either just the most fundamental morphisms, implying the category is generated by their compositional closure - or, some collection of morphisms, with a particular property, where we are told they all factorize in a particular way.)

I am wondering if we can use limits in some way, like to “simplify” or “reduce” a category. With products, it seems like A×BA \times B has all the information about maps into AA and into BB. (But they have to be from the same source, which is interesting). Is there a way we can just get rid of objects AA and BB then, without information loss?

Anyway: to say a query language can compute limits, for a database… I’m trying to think of a simple example.

schema.jpg

It seems like here we can think of objects as sets of particular kinds of things / types, and morphisms as maps on those sets.

Maybe we can consider the diagram from “Department” to “String” via “name”.

I think there are four cones: the object “Dep” acts as the apex of a cone, since it maps to itself and to “String” via “name”, and all morphisms should commute.

The other cones are from apex “Emp”. We can choose either the first name or last name map into String, or both, for a cone. (I think).

The question is, amongst the cones, is one of them “universal”? I think the one with both name morphisms from “Emp” might be, but I’m not sure. It seems to suggest that for each department, we technically can represent that department by its secretary. So while for any department, we can determine its name; we can “expand the view” and it turns out that for any employee, we can figure out their department, and then that department’s name…

Is this related to the idea? Do limits, in categorical data, let us find a “most informative” point of view? Do we want that, so we can “prune away” the redundancies?

(And then, the question of colimits comes, later.)

Thanks!

view this post on Zulip Emilio Minichiello (Aug 13 2024 at 19:22):

I got a little lost in your question, maybe you could try and state it more succinctly in a sentence or two. Here's some of my thoughts on your thoughts. The best place to start learning about categorical database theory is David's paper. In that paper he put's forward what I'll call the unattributed model of categorical database theory. Nowadays we have a more nuanced model called the algebraic data model, which is explained beautifully in Algebraic Databases. This is the mathematical model that CQL actually runs on. But its a good deal more complex than the unattributed model, and I can already say several things about the ideas using that model, so let's stick with it.

Here's how limits and colimits come into database theory: In the unattributed model, a schema UU is a category. (If we want to be picky, we should really say its a finite category presentation, that way we can put it into a computer). An instance on UU is a functor I:USetI : U \to \mathbf{Set}. So I have a whole category UInst=SetUU\mathbf{Inst} = \mathbf{Set}^U whose objects are functors and whose morphisms are natural transformations. So if I have a functor F:VUF: V \to U, then I get a functor ΔF:UInstVInst\Delta_F : U\mathbf{Inst} \to V\mathbf{Inst} by precomposing with FF. This functor has left and right adjoints ΣF\Sigma_F and ΠF\Pi_F, and computing these is done by computing certain colimits and limits, respectively. In particular they are Kan extensions.

It turns out (by work of Spivak, Wisnesky) that you can implement typical SELECT FROM WHERE queries using these functors Δ,Σ,Π\Delta, \Sigma, \Pi. So the moral is: categorical operations like limits and colimits (sometimes) coincide with useful database operations.

view this post on Zulip Emilio Minichiello (Aug 13 2024 at 19:26):

Now your idea there at the end, of cones inside the schema, is not what is considered in the papers above, but it is an interesting idea that is considered heavily in another model of categorical database theory, namely the EASIK or Sketch data model of Rosebrugh and Johnson. Here the idea is to model database constraints on a instance using (co)cones in the schema, and requiring the instance as a functor to turn these (co)cones into (co)limits of sets.

view this post on Zulip Emilio Minichiello (Aug 13 2024 at 19:36):

I'll also give some more details on what Ryan said about CQL computing colimits, but it requires some more knowledge of category theory. In the algebraic data model, schemas are modelled using (presentations of) certain kinds of algebraic theories. Instances are then (presentations of) models of algebraic theories, i.e. product-preserving functors to Set\mathbf{Set}. Thus for a schema UU we again get a category UInstU\mathbf{Inst}, whose objects are instances and morphisms are natural transformations. It turns out that UInstU \mathbf{Inst} is an algebraic category, which while not being as nice as SetU\mathbf{Set}^U, it is still really quite nice, having all limits and colimits. Now without diving into too much more detail, limits turn out to be easy to compute in this case, but colimits are trickier. CQL excels at computing the colimits in these categories. The way it does it is by using theorem provers and term rewriting theory to iteratively glue together generators of the instances. This ends up being a really cool way to compute a classical database algorithm called the chase, more on that here.

view this post on Zulip Julius Hamilton (Aug 16 2024 at 23:59):

Thank you.

So UInst=SetUU\mathbf{Inst} = \mathbf{Set}^U is the category of functors from a schema UU to Set\mathbf{Set}. This functor makes the database schema "concrete" by providing "actual data" for the schema. For example, it might map an object PeopleOb(U)\mathrm{People} \in Ob(U) to a set of people in Set\mathbf{Set}.

For a functor F:VUF : V \to U, from one schema to another, there is an induced (?) functor from UU's category of functors into Set\mathbf{Set} to VV's category of functors into Set\mathbf{Set}. It took me a second to get this, but now I do: since UInstU\mathbf{Inst} is the category of functors from UU to Set\mathbf{Set}, then for every functor X:USetX : U \to \mathbf{Set}, (pre)compose it with F:VUF : V \to U. ("Pre"compose just means compose, but highlights that it is the next mentioned thing which comes first in the composition.) Now every functor in UInstU\mathbf{Inst} has become a functor XF:VSetX \circ F : V \to \mathbf{Set}.

This induced functor ΔF\Delta_F exists for any choice of F:VUF: V \to U. My guess / to be proven: every functor FF determines a unique functor ΔF\Delta_F.

I'll try to prove the statement, "This functor has left and right adjoints."

view this post on Zulip Julius Hamilton (Aug 17 2024 at 00:15):

In order to prove that ΔF\Delta_F has a left adjoint, which is called ΣF\Sigma_F, I need to show that for all XUUInstX_U \in U\mathbf{Inst} and XVVInstX_V \in V\mathbf{Inst}, Hom(ΔF(XU),XV)Hom(XU,ΣF(XV))Hom(\Delta_F(X_U), X_V) \cong Hom(X_U, \Sigma_F(X_V)).

In order to show that, I need to define what functor ΣF\Sigma_F does. It is the left Kan-extension of ΔF\Delta_F. I'll study the definition of a left Kan extension.

view this post on Zulip Emilio Minichiello (Aug 17 2024 at 12:07):

Yes, exactly! Only thing is that ΔF\Delta_F is right adjoint to ΣF\Sigma_F so it has to appear on the right side of the Homs, as follows. Let XUInstX \in U\mathbf{Inst} and YVInstY \in V\mathbf{Inst}, then we have

VInst(ΣF(X),Y)UInst(X,ΔF(Y)).V\mathbf{Inst}(\Sigma_F(X), Y) \cong U\mathbf{Inst}(X, \Delta_F(Y)).

David Spivak's paper gives a really friendly introduction to these ideas, and especially Kan extensions, I'd highly recommend taking a look at it.

view this post on Zulip Emilio Minichiello (Aug 17 2024 at 12:10):

Actually I think this older version of that paper gives an even nicer introduction to these ideas, but note that he uses different notation for the left and right Kan extensions here.