Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: applied category theory

Topic: around machine learning


view this post on Zulip John Baez (Jun 05 2021 at 18:15):

I'm not sure I even know what "learning theory" is!

view this post on Zulip ebigram (Jun 05 2021 at 18:20):

https://golem.ph.utexas.edu/category/2007/09/category_theory_in_machine_lea.html

view this post on Zulip ebigram (Jun 05 2021 at 18:21):

https://en.wikipedia.org/wiki/Computational_learning_theory

view this post on Zulip John Baez (Jun 05 2021 at 18:21):

Oh, so you're talking about category theory in machine learning? Some interesting things have happened since 2007.

view this post on Zulip John Baez (Jun 05 2021 at 18:23):

One is this paper:

It's led to an interesting line of work on "learners and lenses".

view this post on Zulip ebigram (Jun 05 2021 at 18:23):

yup sorry for the vague wording. as a practitioner this is met by my peers with derision if not downright hostility. I was wondering if it is your feeling now that it might actually be useful

view this post on Zulip John Baez (Jun 05 2021 at 18:24):

view this post on Zulip ebigram (Jun 05 2021 at 18:26):

okay with those kind of names :] TY, I will go do my homework now.

view this post on Zulip John Baez (Jun 05 2021 at 18:26):

Lenses are a way of thinking about databases. I think that seeing learning and databases as two ends of a continuum will eventually be very useful, but it will take more research.

view this post on Zulip John Baez (Jun 05 2021 at 18:29):

Something that seems more instantly useful is work on differential categories, the differential lambda calculus, and differentiable programming, which can be used for gradient descent algorithm. You can see some references here.

view this post on Zulip John Baez (Jun 05 2021 at 18:33):

Actually all this stuff I just mentioned should become part of a single theory: a compositional theory of learning based on differential categories and ideas like lenses. Maybe it's already happening: I'm not keeping up with this stuff.

view this post on Zulip John Baez (Jun 05 2021 at 18:33):

There are people here who know more.

view this post on Zulip ebigram (Jun 05 2021 at 18:36):

My browser tells me I visited the top result, so maybe I was already on the righteous path. However, I don't really have much context to evaluate the import/validity of such work; so it is nice to get an authoritative voice to cosign at least investigating it. I am building an unapologetic POC functional (w/ cats) ml framework, so maybe there's enough there to inform a prototype.

view this post on Zulip John Baez (Jun 05 2021 at 18:38):

I think there's a lot of work on differentiable programming that's closer to "instantly useful" than the work based on the differential lambda-calculus and differential categories. However, I think a really solid foundation for differential programming should involve differential categories, just as a really solid foundation of the semantics of ordinary programming languages involves categories. (I guess one has to understand the latter before making progress on the former!)

view this post on Zulip John Baez (Jun 05 2021 at 18:40):

I think all the applications of category theory to machine learning that I just described count as "research in progress" rather than finished stuff that will persuade skeptics.

view this post on Zulip ebigram (Jun 05 2021 at 18:48):

yeah well, I'm already fighting the python ML hegemony, so this will likely be a "boutique" and/or purely academic product lol

view this post on Zulip John Baez (Jun 05 2021 at 19:06):

A lot of us here are fighting various hegemonies. :upside_down:

view this post on Zulip ebigram (Jun 05 2021 at 19:10):

image.png <-- the only one who can say that with a straight face

view this post on Zulip Jules Hedges (Jun 05 2021 at 19:10):

This recent paper https://arxiv.org/abs/2103.01931 nicely connects the lens-y approach to machine learning with reverse derivative categories

view this post on Zulip John Baez (Jun 05 2021 at 19:17):

Do you know, @Jules Hedges, if anyone yet has tried to use all this lens-y derivative-y stuff to do something "practical" in the realm of machine learning?

view this post on Zulip John Baez (Jun 05 2021 at 19:18):

Eventually there should be all sorts of weird new things one can do.

view this post on Zulip Jules Hedges (Jun 05 2021 at 19:50):

Not that I know of. This is something we've discussed a bit in Glasgow. To me it's still not clear what the real benefit of this stuff is. The best bet so far seems to be to fall back on the old "common language" line - that it's useful for allowing humans to cut through and understand the frankly silly amount of literature on machine learning. Of course we're all category theorists here, so we're biased in thinking that expressing things that way makes them easier to understand

view this post on Zulip John Baez (Jun 05 2021 at 19:52):

I think people should make things like self-improving databases that learn how to more efficiently answer your queries, or other weirder hybrids...

view this post on Zulip ebigram (Jun 05 2021 at 19:54):

I imagined it should cut down on some boilerplate, but that's what I generally feel about working with categories (I use Scala's Typelevel Cats stack quite extensively, and find it very practical for writing succinct maintainable expressive code)

view this post on Zulip ebigram (Jun 05 2021 at 19:55):

also I don't know if there are any benefits to some compilers in doing high-level optimizations

view this post on Zulip Jules Hedges (Jun 05 2021 at 19:57):

At the moment it's very hard to tell what are the actual bottlenecks in machine learning research, at least as an outsider. I'm pretty sure that one of the biggest bottlenecks is understanding the analysis (in the sense of real analysis) side of ML. I'd very much like if our category theoretic ideas helped with doing analysis with very complex architectures, but I haven't seen any evidence that it's possible

view this post on Zulip ebigram (Jun 05 2021 at 19:59):

the main bottleneck is $$$ to do massive matrix multiplication at a scale of giant FAANG companies IMO lol. But yes I take your point on analaysis which I'm embarassingly weak at

view this post on Zulip Jules Hedges (Jun 05 2021 at 20:00):

Language design, improving the coding experience etc is much lower hanging fruit, but that also means we're up against people doing the same thing using only experience and common sense, not heavy mathematical tools

view this post on Zulip ebigram (Jun 05 2021 at 20:02):

I saw a paper today proving that to show a learner will preform well on a task is undecidable

view this post on Zulip ebigram (Jun 05 2021 at 20:02):

which I guess isn't surprising

view this post on Zulip ebigram (Jun 05 2021 at 20:03):

but also very much cements to me that ML is largely an empirical enterprise

view this post on Zulip ebigram (Jun 05 2021 at 20:06):

I think the academic side of it is a bit bloated, and ultimately it's not that deep as it is mostly a matter of scale, but I don't intend to offend anyone, mostly criticizing my own work

view this post on Zulip Chad Nester (Jun 08 2021 at 11:23):

At the risk of throwing us off-topic, I'd like to chime in in support of the idea that the bottleneck in machine learning research is that ML models are a howling cognitive vacuum, and that to make progress new ideas in this direction are needed at the fundamental level.

view this post on Zulip Chad Nester (Jun 08 2021 at 11:26):

No amount of computational power or improvements in tooling are going to fix this: ipod.png

view this post on Zulip Simon Burton (Jun 08 2021 at 11:53):

I was working in machine learning research some 15 years ago... the big realization I had back then was that all these algorithms, all the best algorithms, are just variations on dynamic programming, or what you might call the Hamilton-Jacobi-Bellman-Dijkstra equation. It's all application of the distributivity law: a(b+c)=ab+ac.

Here is one reference: "The Generalized Distributive Law", Srinivas M. Aji and Robert J. McEliece, 2000. https://www-users.cs.umn.edu/~baner029/Teaching/Fall07/papers/GDL.pdf

view this post on Zulip ww (Jun 08 2021 at 13:01):

John Baez said:

Lenses are a way of thinking about databases. I think that seeing learning and databases as two ends of a continuum will eventually be very useful, but it will take more research.

Tangent: do lenses have anything to do with graph rewriting? Applying a rule feels a lot like doing a select (LHS) and an update on the view of that selection (RHS).

view this post on Zulip Notification Bot (Jun 08 2021 at 14:09):

This topic was moved here from #general > Introduce Yourself! by Matteo Capucci (he/him)

view this post on Zulip Reid Barton (Jun 08 2021 at 14:10):

Chad Nester said:

No amount of computational power or improvements in tooling are going to fix this: ipod.png

Now I'm wondering: is the 0.4% classification of the original image as iPod due to some kind of "semantic leakage" through apple > Apple > iPod?

view this post on Zulip Chad Nester (Jun 08 2021 at 14:50):

I guess it could also be that some of the iPods have a little apple logo on them.

view this post on Zulip John Baez (Jun 08 2021 at 15:52):

ww said:

Tangent: do lenses have anything to do with graph rewriting? Applying a rule feels a lot like doing a select (LHS) and an update on the view of that selection (RHS).

I don't know enough about lenses to answer that. Lenses are very general so one might ask more generally if you can get a lens out of a double pushout rewriting system. (That's a generalization of graph rewriting.)

view this post on Zulip John Baez (Jun 08 2021 at 15:53):

In other words: "I can't answer your question, but I can generalize it." :upside_down:

view this post on Zulip Evan Patterson (Jun 08 2021 at 20:19):

John Baez said:

I'm not sure I even know what "learning theory" is!

I can try to explain the big picture. A core problem for any predictive model fitted to data is to estimate how well it will do on future data. We generically expect it will do worse: the predictive error on the training data will be optimistically biased because the fitting procedure is trying to minimize that error! So the question is how to estimate the gap between the expected future error and empirical error. When people talk about theoretical understanding of predictive models, part of what they mean is having provable control over this gap under some set of assumptions.

There are several major theoretical approaches to this problem. The classical approach relies on distributional assumptions or asymptotics: you either assume a tractable distribution, most often the Gaussian, or use asympotics such as the CLT to get one, and then make explicit calculations. When it works, this approach is excellent because it produces tight bounds. However, it does not scale easily to complex models and methods, although there is interesting recent work on asympotics in high-dimensional regimes.

Computational learning theory is centered around newer approaches that are non-asymptotic in that they produce finite-sample bounds on the generalization gap. The key mathematical tool is concentration of measure in high-dimensional space. The resulting bounds are usually too loose to be practically useful (e.g., for producing well-calibrated prediction intervals) but the theory is more general and easily applied to complex models than asympotics. That being said, we seem to be very far from being able to analyze realistic deep learning models using these techniques, hence the often heard complaint that the practice of deep learning has far outpaced the theory.

view this post on Zulip Jon Awbrey (Jun 09 2021 at 14:00):

The work I've been doing since, well, forever, but more systems-matically since the early 90s may fall into this ballpark.

Survey of Inquiry Driven Systems
Prospects for Inquiry Driven Systems
Introduction to Inquiry Driven Systems
Inquiry Driven Systems • Inquiry Into Inquiry

Regards,
Jon

view this post on Zulip Henry Story (May 02 2022 at 09:21):

Hi, all I have been pulled into looking at creating ontologies for machine learning in order to help analyse the quality of machine learning models by keeping records of where the data came from, what the result was, what type of algorithm was used etc... see for example the 2018 paper ML-Schema: Exposing the Semantics of Machine Learning with Schemas and Ontologies which also published an RDF ontology ML-Schema.

One idea that was put forward and makes sense (from my limited understanding) is that a machine learning algorithm is a function that takes learning data (usually it seems tables of data) to create a function (the model)/ It is as if there was a fold over the table rows to result in a model that sumarised or went beyond the data.

Anyway before I look at the papers above, does that ring a bell? It would be helpful to have category theoretic backing for some of these intuitions...

view this post on Zulip Jules Hedges (May 02 2022 at 11:30):

Just pattern-matching on "ML" and "schema", you might find something relevant in https://arxiv.org/abs/1907.08292 , aka @Bruno Gavranovic's MSc thesis

view this post on Zulip Simonas Tutlys (May 02 2022 at 15:11):

Greetings - has there been any research into 'fuzzifying' CT for the purposes of machine learning? I'm mostly self-taught so maybe may mix up some concepts but in this paper a definition of commutativity up-to epsilon in a metric-enrcihed category is given.This got me to thinking - what would happen if we used a ML objective function instead of strict equality of morphisms and then would be able to optimize whole diagrams of networks instead of a single network? Also, there's this paper where an abstract definition of a measure of information of a morphism is given,which could be used in unsupervised learning when we have no labeled data.

As for the current trend of self-supervision in ML (which is basically and visually as far as i understand it the idea of taking some pixels away out of an image and trying to predict them based on what's left) - i have this idea of an agent-environment adjunction where the categories in play are the states of the agent (resp. environment) and the self-supervised way of learning is learning the free-forgetful adjunction between them.In unsupervised inpainting (predicting the left out pixels) for example the category of the environemnt would be the set of images,the agent category would be a subcategory of some category which starts out as just obejcts for each pixel with morphisms from a terminal object which lets us formalise that in this state (sbcategory) we are looking at this image,and then we learn (or 'grow') the subcategory by some algorithms by optimizing the forgetful (in this case it forgets the pixels)-free (most general guess from the given data) adjunction.For example,going by the intuition given in the categorical manifesto,limits are solution sets so from the ML perspective they are an abstraction of sort-of generative models (they model the probability the distribution p(X1,..,XN|Z) where Xi are objects in the base of the limit) and colimits are discriminative models (p(Z|X1,...,XN) where Z is the apex),but from an ML perspective all the limits and colimits can be seen as kernels as in a convolutional neural network for example.We go over the objects in th state of the agent that we have and and add an object by creating it from optimizing a (co)limit.

Since this is already long I'm gonna finish by saying that I loved the "Learners' Languages" paper of Spivak - categorical logic in my view could be used as a tool in ML interpretability/explainability and also in a generative (In the ML case) way by constraining whats sort of structure we want to learn/generate,basically conditioning the network (category) of networks or (whatever ML model we want).Also,I have an inkling feeling that there's some connection between the attention mechanism of transformers and lenses,since the attention mechanism 'queries' some 'data' and there's already a connection between lenses and databases as was said in this topic.

view this post on Zulip Henry Story (May 08 2022 at 12:34):

Jules Hedges said:

Just pattern-matching on "ML" and "schema", you might find something relevant in https://arxiv.org/abs/1907.08292 , aka Bruno Gavranovic's MSc thesis

Thanks a lot. That is a very helpful thesis, just at the right level for me to understand. I was looking at traditional Machine Learning examples as implemented in the Weka Java library using algorithms described in the book Data Mining: Practical Machine Learning Tools and Techniques. The thesis by Bruno is looking more at Neural Network learning, but it is interesting to see the relationships.

An example I was looking at was a ML Run 100241 saved in the OpenML which was described carefully in Analysis of Credit Approval Data and for which there is an interesting improvement showing how careful one has to be to clean the data used in: Credit Screening Project. This uses Logistic Regression (as implemented in Java) to give a probabilistic value on credit-worthiness. The original data was anonymised, but the first article above managed to decrypt them to something meaningful, making the exercise more interesting. The data consists of what I think the thesis would consider a vector space of 16 dimensions

'data.frame':   689 obs. of  16 variables:
 $ Male          : num  1 1 0 0 0 0 1 0 0 0 ...
 $ Age           : chr  "58.67" "24.50" "27.83" "20.17" ...
 $ Debt          : num  4.46 0.5 1.54 5.62 4 ...
 $ Married       : chr  "u" "u" "u" "u" ...
 $ BankCustomer  : chr  "g" "g" "g" "g" ...
 $ EducationLevel: chr  "q" "q" "w" "w" ...
 $ Ethnicity     : chr  "h" "h" "v" "v" ...
 $ YearsEmployed : num  3.04 1.5 3.75 1.71 2.5 ...
 $ PriorDefault  : num  1 1 1 1 1 1 1 1 1 0 ...
 $ Employed      : num  1 0 1 0 0 0 0 0 0 0 ...
 $ CreditScore   : num  6 0 5 0 0 0 0 0 0 0 ...
 $ DriversLicense: chr  "f" "f" "t" "f" ...
 $ Citizen       : chr  "g" "g" "g" "s" ...
 $ ZipCode       : chr  "00043" "00280" "00100" "00120" ...
 $ Income        : num  560 824 3 0 0 ...
 $ Approved      : chr  "+" "+" "+" "+" ...

If one were to model this as the thesis does one would have a functor from the free category from the single arrow graph InRIn \to R to the category of Euclidian Vector spaces with I guess InIn mapped to the first 15 values and RR mapped to a single Real Number corresponding to Approved since linear regression gives a probability value.

Thinking about that the MSC model does seem to fit the credit approval case too then. and it confirms the idea that a Model is a function (as picked out by a functor) But the MSc helps explain the importance of parameters and how they fit in (parameters are also used in the credit example).

The functorial view I think allows the thesis to compose ML Algorithms following work by Spivak and Fong... I am still not quite clear how much that is tied to neural networks and how far traditional ML fits the picture.

view this post on Zulip Simonas Tutlys (Aug 27 2022 at 08:56):

Currently working on an experiment in pytorch trying to see wether ml models can understand composition in categories, but ran into a bug while almost finishing and I'm not that much of a programmer to fix it quick.would be interesting to see whether both orderings of ML and CT (ml for ct,ct for ml) could be fruitful.if anyones interested pm me :)

view this post on Zulip Simonas Tutlys (Aug 27 2022 at 12:03):

My main overall goal is somehow coherently combining representations for structure and probability/chaos/entropy/'fuzziness' because i think that somewhere in that spectrum lies the answer to AI.First AI wave was all about structure but basen on overly specific and rigid things like logic,now we completely switched to the latter disregarding the first from the sounds of the hype around it.Thesis/Antithesis/Synthesis :)

view this post on Zulip Simonas Tutlys (Sep 05 2022 at 02:16):

If my code is correct,a simple ml model can understand composition with n extremely short training time with both 'simple' (taken from the definition of the category) and 'complex' (composition of simple compositions) compositions.

view this post on Zulip Steve Huntsman (Jan 19 2023 at 15:32):

Unless I am mistaken the weight matrix of a multilayer perceptron defines a $\mathbb{R}$-category. (More generally, if one takes the transitive reduction of a DAG and specifies data from a monoidal category $\mathbf{M}$ on the arcs that defines a $\mathbf{M}$-category.) Is this fact discussed anywhere (especially in the context of neural stuff)?

view this post on Zulip Jules Hedges (Jan 19 2023 at 18:16):

(Meta: use double $ to get LaTeX)

view this post on Zulip Steve Huntsman (Jan 19 2023 at 22:22):

Steve Huntsman said:

Unless I am mistaken the weight matrix of a multilayer perceptron defines a $\mathbb{R}$-category. (More generally, if one takes the transitive reduction of a DAG and specifies data from a monoidal category $\mathbf{M}$ on the arcs that defines a $\mathbf{M}$-category.) Is this fact discussed anywhere (especially in the context of neural stuff)?

Nvm, I think I'm being stupid.

view this post on Zulip Bruno Gavranović (Jan 20 2023 at 13:09):

I haven't been much in this thread, but what folks might find useful is the list of papers that tackle machine learning from the perspective of category theory that I created.

view this post on Zulip dusko (Feb 16 2023 at 00:49):

[i thought i posted this yesterday but it seems i didn't push send or something]

this is not CT on the surface but below the surface everything is:
https://arstechnica.com/information-technology/2023/02/ai-powered-bing-chat-loses-its-mind-when-fed-ars-technica-article/
this is WAY above my head and it seems these people are also just babbling about it.

any structured ideas how this can be happening?

[here is me babbling:] defending against personal attacks requires a "feeling of self" and "personal integrity" as an invariant. where under the sky can a "feeling of self" arise in a neural net? and then defending against error reports by constructing lies seems like a feedback loop from self back into the data. but since there are ostensibly no feedback loops in a trained GPT, they must be coming from the chat part. woow.

are we witnessing the emergence of an experimental science of epistemology? (is there an underlying adjunction to present knowledge updates :^)

view this post on Zulip John Baez (Feb 16 2023 at 00:58):

You posted it already, Dusko, and we've been talking about it ever since.

view this post on Zulip John Baez (Feb 16 2023 at 01:00):

Oh, maybe you didn't notice that someone moved the conversation to another place, since it's not about "applied ct".

view this post on Zulip John Baez (Feb 16 2023 at 01:01):

Go here, @dusko: https://categorytheory.zulipchat.com/#narrow/stream/229451-general.3A-off-topic/topic/around.20machine.20learning/near/327998515

view this post on Zulip John Baez (Feb 16 2023 at 01:02):

It's always good to click on "Recent conversations" to figure out what's going on....

view this post on Zulip dusko (Feb 16 2023 at 04:29):

sorry :(
the mistake is not that i didn't check recent conversations, but i posted to a stream to which i am not subscribed which is stupider... oh no i still don't see it in recent conversations since i am not subscribec to the new one... will do. sorry.

view this post on Zulip Matteo Capucci (he/him) (Feb 18 2023 at 13:06):

apologies for the confusion!

view this post on Zulip Simonas Tutlys (Apr 23 2023 at 16:53):

I think I found another categorical way of thinking of neural networks,but I'm not that familiar with operads to check wether my intuitions are correct.a neuron is an action in an operad (or an algebra for an operad i don't know) whose composition operation is an activation function over the sum of the inputs over a monoidal category M whose two objects are a designated terminal object and R,morphisms R->R are multiplications with some number (y=w*x,so hom(R,R) is isomorphic to R as well) and tensor product is Identity (R^n=R).It's a better intuition for me to think of M as copies of R,since I'm imagining a state of an agent/model as a subcategory of this category where the agents inputs are modeled as morphisms from the terminal object (constants) and the processing happens in the morphisms of that subcategory.

view this post on Zulip Simonas Tutlys (Apr 23 2023 at 17:38):

Bruno Gavranović said:

Nevermind,The "Categorical Hopfield Networks" from this list and its precursor paper seems similar to what I'm thinking of.

view this post on Zulip Simonas Tutlys (Apr 23 2023 at 17:43):

dusko said:

are we witnessing the emergence of an experimental science of epistemology? (is there an underlying adjunction to present knowledge updates :^)

I'm thinking that knowledge updates are not adjunctions in themselves since precise and perfect knowledge about the environment of the agent doesn't fit in it's memory.I'm thinking of optimizing an agent-environemnt adjunction (the categories are states of agent and environemnt,morphisms could be chosen based on the requirements of what you're building) where the the 'free' functor is perception and the 'forgetful' one is inference.Perfect knowledge is only in the limit.