Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: applied category theory

Topic: category theory for nested data structures

Davi Sales Barreira (May 03 2022 at 16:54):

Hello, friends. Recently, I've read an article from Spivak (2015) where he talked about how nested data structures had not yet been formalized in category theory. I'm quite interested in this topic, I was wondering if anyone is aware of advances in such front?

John Baez (May 03 2022 at 16:56):

Not me. Maybe someone else here knows. If not, it would be good to email David Spivak and let us know his answer!

Alex Gryzlov (May 03 2022 at 22:19):

As in "nested inductive types"?

Davi Sales Barreira (May 03 2022 at 22:59):

David showed how a schema can be thought of as a category, and thus used it to rigorously define things like data migration. I was wondering if there was something similar, but with nested data structures.

Eric Bond (May 21 2022 at 13:44):

Are you aware of datatypes represented as polynomial functors/containers?
https://www.cs.nott.ac.uk/~psztxa/publ/fossacs03.pdf

Eric Bond (May 21 2022 at 13:45):

http://www.contrib.andrew.cmu.edu/~avigad/Papers/qpf.pdf

Valeria de Paiva (May 21 2022 at 14:04):

Thanks for the Avigad et al paper @Eric Bond that I didn't know about. but 2019 is a bit old too, what happened since? do you know?did they complete their Lean library? has it been ported to agda?

Eric Bond (May 21 2022 at 14:17):

I’m not sure what has happened since. Im sure a fair amount of their constructions have been replicated in Agda. What would be interesting is a reexamination of this paper through the lense of HoTT where we can use higher inductive types or quotient inductive inductive types instead of Lean’s notion of quotient.

Davi Sales Barreira (May 21 2022 at 14:28):

Didn't know about it, @Eric Bond . Thanks for the references.

Jules Hedges (May 21 2022 at 17:41):

I once asked in Strathclyde why containers are not used in proof assistants. If I remember correctly, the answer was that you need functional extensionality to prove the container representation theorem (aka that $\mathbf{DLens} \cong \mathbf{Poly}$ )

Bob Atkey (May 22 2022 at 09:50):

Another reason containers are not used in proof assistants is that the direct implementation isn't very efficient. Using a function to represent a small tuple is usually going to be worse, and involve more unnecessary recomputation, than just storing the values.

Valeria de Paiva (May 22 2022 at 14:37):

Thanks @Bob Atkey and @Jules Hedges , this makes sense. but I wish @Conor McBride would comment, as when this zulip started he seemed to have high hopes for the implementation of polynomials, if I understood correctly what he was saying then. (always a big if...)

Simonas Tutlys (May 23 2022 at 07:42):

There's combinatorial species which is very similar to datataypes-as-functors.Maybe this may be of use since from the abstract it tries to reconcile ADTs and species through HoTT? Don't know about nesting though. Is there any reason it couldn't just be regular composition of functors?

https://www.cis.upenn.edu/~sweirich/papers/yorgey-thesis.pdf

Davi Sales Barreira (May 23 2022 at 11:26):

The idea is the formalize the interplay between the schema category defined by @David Spivak , and a representation of the data as a nested collection. David formalized the notion of a database schema, where one defines the schema as a category S and the instance of a database is a functor (everyone here probably knows this better than me). So what I'm seeking is a way to represent the nested collection in a similar way, such that one can translate an instance from a relational db to a nested collection.
I'm still very new to category theory (started studying this year actually), so it's very likely that the solution might not at all complicated, and I'm just unable to grasp it. Or it might be quite complicated, and I just have no idea what I'm getting into.

Simonas Tutlys (May 24 2022 at 07:55):

You can do category theory in category theory so I get where you're coming from - I'm self taught and CT is less of a dense forest and more like fractal matroshka doll where you can model one thing in various equally abstract ways which turn out to be equivalent for some notion of equivalence.It's the most useful empty bowl in math.

Spencer Breiner (May 24 2022 at 13:50):

Hi @Davi Sales Barreira: Nesting (also related to aggregation) turns out to be surprisingly tricky in this context. Spivak has made some recent progress on the problem using polynomial functors, but I wouldn't say the issue is resolved yet,

To understand what's going on, lets build a schema with one object (entity) $X$ and one arrow $\ell:X\to P=\mathcal{P}(\mathbb{Z})$ .

In order to "disaggregate" the subsets, we could extend the schema with a [[bundle]] over $X$ . If you're not familiar with this idea, we can represent a families of sets indexed by $X$ a function $f:Y\to X$ ; each element $x_i$ corresponds to the set of $y$ 's that map to it (the [[fiber]]).

This forgets about the "identity" of the elements in the list, but we can put that back in by adding an attribute $v:Y\to \mathbb{Z}$ . So the new schema will look like this
$\mathbb{S}\leftarrow Y \rightarrow X$

Using bundles and spans to replace nested data structures is a relatively routine move in database engineering.

The problem of aggregation is basically the reverse: if we have a bundle in our schema, how can we calculate aggregate operation like a sum over the fibers. The simplest case is just counting how many elements are in a fiber (or even whether there are any).

"Aggregate attributes" like this don't behave in the same way as ordinary attributes; in particular, they are not preserved by natural transformations. You can check this paper for a discussion of some of the issues involved.

Davi Sales Barreira (May 24 2022 at 13:55):

thanks a lot @Spencer Breiner ! Some of the concepts are new to me, and I'm still trying to figure out the machinery necessary. You gave me a lot to think about. Indeed, I'm aware of the Functorial Aggregation paper by Spivak. It's in my reading list ;D I just didn't start yet because I need more solid foundation in CT.
I've actually set a message to David Spivak asking whether the nested structure had already been figured out, since it was related to the aggregation. He told me that it was still not resolved.

Spencer Breiner (May 24 2022 at 13:58):

The Functorial Aggregation paper is pretty hard. I'm still digesting it myself. The scheduling paper I pointed to is intended to be accessible, so that's probably a good place to start.

Spencer Breiner (May 24 2022 at 13:59):

Do you have an example or use case that you want to apply? I find it helps to have a concrete example in front of me when I want to understand an abstract concept.

Spencer Breiner (May 24 2022 at 14:01):

A good exercise is to think about the difference between representing a subset in a schema as an arrow $Y\to X$ versus a truth function $X\to {\rm Bool}$ . The instances are the same, but the morphisms are different.

Spencer Breiner (May 24 2022 at 14:03):

Whoops! We actually need a limit sketch/schema $Y\rightarrowtail X$ (rather than just a plain category) to get the instances to agree.

Daniel Geisler (May 24 2022 at 15:05):

See Analytic Combinatorics - Algorithms Project pg.124
Example using combinatorial species -
Set Partitions: F=set(set(Z, card≥1)) while the recursive version is Total Partitions: H=Z+set(H, card≥2).

Davi Sales Barreira (May 24 2022 at 17:06):

Spencer Breiner said:

Do you have an example or use case that you want to apply? I find it helps to have a concrete example in front of me when I want to understand an abstract concept.

Yeah, I do.

Davi Sales Barreira (May 24 2022 at 17:06):

Daniel Geisler said:

See Analytic Combinatorics - Algorithms Project pg.124
Example using combinatorial species -
Set Partitions: F=set(set(Z, card≥1)) while the recursive version is Total Partitions: H=Z+set(H, card≥2).

Thanks, I'll take a look.

Davi Sales Barreira (May 24 2022 at 17:08):

@Spencer Breiner , I'll take a look at the article you posted. At the moment, I've mostly studied CT via more applied books (e.g. Seven Sketches; Category Theory for the Sciences). I'm actually finishing Category Theory for the Sciences, do you recommend any follow up book? Many of the terms you've used I haven't actually encountered in these books.

Davi Sales Barreira (May 24 2022 at 22:13):

I'm interested in the interplay between a relational db and a nested data structure. I want to formalize how to take RDB instances to a nested format.