Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: learning: questions

Topic: Multi-sorted equational algebra


view this post on Zulip Julius Hamilton (Feb 02 2025 at 20:21):

I am trying to get a better grasp on studying the structures determined by multisorted equational theories, especially when two theories determine the same structure.

For example, given one type, TT, one constant, c:Tc : T, and 2 functions, f:TTf : T \to T and g:TTg : T \to T, and no additional equations, terms of this theory would include:

It would be nice to have a general routine for studying the structure of any such theory (given the signature, and some equations).

In this scenario, I seem to be treating function-application identically to symbol-concatenation. As c:Tc : T and f:TTf : T \to T, I know f(c)f(c) is a valid term. I am wondering if I should formalize very precisely what I mean by function-application here, though.

If TT was a set, and if I said TT is a set such that cTc \in T and such that t,tT    f(t)T\forall t, t \in T \implies f(t) \in T, I believe this leaves open the possibility that f(c)=cf(c) = c, so T={c}T = \{c\}. If we think about a function f:TTf : T \to T in the set-theoretic sense as fT×Tf \subseteq T \times T that is left-total and right-unique, then I believe we can present a model of this theory as T={c}\| T \| = \{ \| c \| \} ("the interpretation of the constant symbol TT is the set containing only the set which is the interpretation of the constant symbol cc"), and f={(c,c}\| f \| = \{ (\|c\|, \|c\| \}. Since f(c)=cf(c) = c and cTc \in T, this structure satisfies the theory.

When one works with term algebras, it seems to be assumed that f(c)cf(c) \not = c, unless that condition is stated in the theory. I guess I feel like this isn't conveyed if we say that f(c)f(c) is "the function ff applied to the element cc", if we have not given more information about what other elements there may or may not be, in TT. We know that f(c)f(c) is a well-defined expression that refers to some element in TT, but not which one.

I wonder if it is taken for granted in type theory that if TT is a type, and cc is a term of type TT, and TTT \to T is a function type, and f:TTf : T \to T is a term of that function type, then fcfc is a term of type TT, and there isn't really a matter of consideration of "what fcfc equals", it's 'just fcfc'? Unless we introduce equality as a type, maybe, to state what fcfc equals?

One simple way to think about associativity, I think, is that "nested expressions get flattened": f(x,f(y,z))=f(f(x,y),z)f(x, f(y, z)) = f(f(x, y), z). If we make the operation ff implicit and denote applying it to xx and yy as xyxy, we see "flattening": x(yz)=(xy)z=xyzx(yz) = (xy)z = xyz.

Screenshot 2025-02-02 at 1.12.28 PM.png

I think I am observing that in the case of the first-mentioned theory, c:T,f:TT,g:TTc : \to T, f : T \to T, g : T \to T, because the terms can be written out as strings (fggfc,ggffgc,fggfc, ggffgc, \ldots), it seems like there is "associativity" going on here; we don't need to designate an order of operations, in the strings. But I think this is because these are unary operations.

If we associate ff with 00, gg with 11, and cc with the empty string "", the terms of this theory become binary strings: 0,1,00,01,11,10,0, 1, 00, 01, 11, 10, \ldots

So I would like to say that this theory is isomorphic to the free monoid with generators 0,10, 1 and identity element "", if I may.

I find this interesting yet counter-intuitive, because what were functions in the first theory became elements in the second theory, and what was an implicit idea in the first theory, "function application", became an explicit operation in the second one, the monoid operation.

What is a nice mathematical way to express that this free algebraic theory of 2 unary functions and one constant turns out to be identical to the theory of a free monoid on 2 generators (I think)?

view this post on Zulip Ryan Wisnesky (Feb 02 2025 at 21:46):

That the terms in this theory are lists is an artifact of only having unary function symbols. In general, terms will be trees. And usually, fc is written f(c) in the general case, and the meaning of this as function application is dictated by considering Set-valued functors (standard models). That f(c) != c in the term model is true by construction, not an assumption. (The initial/term model will satisfy theorems that most models don't, such as induction.)

view this post on Zulip Julius Hamilton (Feb 08 2025 at 21:09):

Ok.

My goal is to classify CQL theories.

I guess I have a lot of reading and studying to do.

view this post on Zulip Julius Hamilton (Feb 08 2025 at 21:13):

The nLab doesn't have an entry for CQL. I feel like it should. It would be a good way to gather all the essential info one needs to understand it.

view this post on Zulip Julius Hamilton (Feb 08 2025 at 21:20):

Perhaps this is the best self-contained, formal and complete explanation of CQL. Algebraic Databases.

view this post on Zulip Julius Hamilton (Feb 08 2025 at 22:14):

May take some notes on the paper to digest a bit further.

  1. There have been various iterations of categorical database theory. Previously, instances of schemas were defined as functors from a category into the category of sets. Intuitively, one can imagine that if a schema includes a type TT, the instance-functor associates TT with a set of elements which are understood to be of type TT.

  2. A newer approach uses Lawvere theories.

  3. The concepts of categorical database theory, such as schemas, instances, schema mappings, and queries, can be modeled as proarrow equipments, a.k.a. framed bicategories.

  4. Another approach to categorical database theory by Robert Rosebrugh and collaborators takes schemas as sketches. A sketch is a formalization of the concept of a theory. It generalizes the notion of a Lawvere theory. A sketch is a small category equipped with a set LL of cones and a set CC of cocones. A small category is a category with a small set of objects and a small set of morphisms. A small set has various definitions depending on the formalization used. It can mean a set and not a proper class. Or, using Grothendieck universes, it can mean a set within a universe UU. I will certainly be taking a detour into studying some examples of [[sketches]]. If TT is a sketch, a model of TT in a category CC is a functor taking each cone in LL to a limit cone in CC, and similarly for cocones.

  5. An alternative definition of sketch is a directed graph with a set of diagrams DD, set of cones CC, and set of cocones LL. Example: consider the directed graph ABA \to B. Let DD be empty. Because there are no diagrams, any selection of a point is a valid cone (I think), since the point is not required to have any morphisms. Let CC contain just the point AA. Let LL be empty. A model of this sketch is a functor FF which sends the cone AA to a limit cone. For reasons unclear to me, F(A)F(A) must be the terminal object 11, and if this is a functor into Set\mathsf{Set}, BB can be any set. This means models of this sketch are pointed sets (sets with a distinguished element).

This is enough to digest. I will explore sketches and their models before continuing.

view this post on Zulip Ryan Wisnesky (Feb 09 2025 at 00:36):

The reason we had to layer categories on top of algebraic theories to do data migration/integration was the same reason database migration/integration theory isn't just finite model theory - you need to distinguish two kinds of value, those coming from the input data and those generated by the data integration/migration operations; this is because unlike query, where the relational algebra cannot express queries that have values in their output not contained in their input. You end up with "unknowns"/"labelled nulls" when integration/migrating data with colimits/lifting problems that you don't when you just query (limits and coproducts) and to account for this semantics you have to enrich the original Spivak model, the original Rosebrugh model, etc. There are a few approaches to this "attribute problem" (our terminology from 'Algebraic Data Integration'). BTW Rosebrugh's EASIK tool is available inside of CQL, you can push a button to translate CQL to EASIK and vice versa.