Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: theory: applied category theory

Topic: DisCoCat

Jules Hedges (Mar 25 2020 at 11:06):

@Fabrizio Genovese Can you explain your claim on #general that switching from the free pregroup to the free autonomous/CCC defeats the purpose? I thought about exactly this a year or 2 ago and convinced myself that it's a good idea, and that Anne Preller was right and everyone else was wrong

Jules Hedges (Mar 25 2020 at 11:07):

I think of it like the free autonomous cat is a proof-relevant version of the free pregroup. So the linear map you apply to your word vector depends on the derivation, not just on the fact that a derivation exists

Jules Hedges (Mar 25 2020 at 11:12):

In general, generative grammar is logic, and history tells us [controversial claim but probably not around here] that categorical semantics is better than posetal semantics

Fabrizio Genovese (Mar 25 2020 at 11:54):

Jules Hedges said:

Fabrizio Genovese Can you explain your claim on #general that switching from the free pregroup to the free autonomous/CCC defeats the purpose? I thought about exactly this a year or 2 ago and convinced myself that it's a good idea, and that Anne Preller was right and everyone else was wrong

So the original idea of discocat is "we tag every word with an element of the pregroup (noun, sentence, whatevs) and then we use the pregroup rules to infer the wiring. The reason why you can't do this is because you can prove that the vector space you get is trivial (as Preller did). So we resort to generate this huge category with all the words in it. Then things clearly work, but I guess you lose the original purpose of the idea, that is, separating tags from semantics, and encoding the tagging system in the functor. Now tags, semantics and tagging systems are all more or less conflated in the free CCC you generate.

Fabrizio Genovese (Mar 25 2020 at 11:55):

So yes, categorically it works, but from a practical/computational point of view I find it to be quite unsatisfying.

Fabrizio Genovese (Mar 25 2020 at 14:05):

Yes, probably you want to flip that

Fabrizio Genovese (Mar 25 2020 at 14:05):

What I mean is that at that point tags and their assignments to words become all jumbled together

Fabrizio Genovese (Mar 25 2020 at 14:06):

So yes, that works, but then I don't see a real reason for not doing the wires directly in the semantics

Fabrizio Genovese (Mar 25 2020 at 14:18):

Rongmin Lu said:

Practically speaking, that's how I'd imagine languages behave. The ERNIE work I cited over in #general showed how their system improved when they allowed some form of "chunking", which seems to me to mess up the neat tagging system envisioned in the original idea (at least to the best of my understanding of your description). Their original motivation was to improve the processing of Chinese, where you often smoosh characters together to make a word that has a meaning not obvious from the surface meanings of its characters. Similar things happen in English as well: proper nouns, idioms, sayings, etc.

Yes. This has to do with the fact that, as you said, current method basically work well only for romance languages, maybe indo-european. I'm a language geek myself and I had my share of non-european languages, and I agree that current approaches in DiscoCat scale quite badly outside of Romance/Germanic languages.

Fabrizio Genovese (Mar 25 2020 at 14:19):

So the reason why I focused on Hieroglyphs is something that I then found to happen also in Asian languages such as Japanese. Basically people are happy with having a lot of indeterminateness going around, to be clarified from context.

Fabrizio Genovese (Mar 25 2020 at 14:21):

So while in English, say, I always have to communicate when an action is happening (dobe my picking a tense), this is not true in Egyptian, or in Japanese to some extent. Japanese is particularly interesting, because if on one hand they care very little about being specific in our western grammatical sense, on the other they are very careful wrt other things we are totally oblivious of, such as who you are speaking to (the same sentence can be said in many different ways depending on the degree of politeness you want to use, and for westeners it's insane how much granular this can be)

Fabrizio Genovese (Mar 25 2020 at 14:22):

Rongmin Lu said:

I'm reminded also of a Twitter thread that might have started with Jules Hedges (although I could be wrong), where someone opined that analysts don't find category theory useful because they're often juggling half a dozen conditions simultaneously. The idea was floated that perhaps we should have a more flexible notion of a category to accommodate such situations, and I believe Evan Patterson mentioned something along that line, although I forgot what the technical name was. Something similar seems to be happening here: my feeling is DisCoCat is rather rigid for the ambitions that people have for it.

I feel the topic measuring non-compositonality deals exactly with this sort of problems. Language is one of the reasons why we started thinking about it

Fabrizio Genovese (Mar 25 2020 at 14:24):

Well, a friend of mine told me this lifesaver thing: When you are in Japan and you are still very bad with the language, just begin any conversation by saying "Hey, I'm just learning and I am not good. Could you speak to me like I were a child?" In this way:

You are somehow justified for being rude, since you don't know any better
They talk to you in a simpler way, making your struggle less intense :)

Fabrizio Genovese (Mar 25 2020 at 14:39):

Ancient Greek is quite the opposite. Ancient Greek and Sanskrit are increadibly precise in the western grammatical sense.

Fabrizio Genovese (Mar 25 2020 at 14:40):

Hebrew yes, I agree with you. The reason for this is that Sanskrit and Ancient Greek are the languages that retain the most from the proto-indo-european, which is highly sintethic. Hebrew on the contrary is a semitic language, so a whole different thing.

(=_=) (Mar 25 2020 at 14:44):

Fabrizio Genovese said:

Ancient Greek is quite the opposite. Ancient Greek and Sanskrit are increadibly precise in the western grammatical sense.

Good to know, and not too surprising in hindsight. After all, these two ancient societies were pretty obsessed about grammar and teaching it well.

Fabrizio Genovese (Mar 25 2020 at 14:45):

Those two are by far the most difficult languages I studied.

Fabrizio Genovese (Mar 25 2020 at 14:46):

But I should mention that I explicitly avoided having anything to do with Na-Dene languages (e.g. Navajo) which I think are the most difficult languages on the planet (at least considering the ones I am aware of)

Bob Coecke (Mar 26 2020 at 08:52):

@Rongmin Lu @Fabrizio Genovese Replying to functor between grammar and meaning in whatever direction; neither are any good, really. In the first paper I wrote, as well as in the latest, there is a single category that both has structures (including grammar but not exclusively) and meaning. Understanding what this structure is is part of the effort, in the same way as better understanding what the meaning spaces are, and these two problems are obviously not two separate ones, but a single one.

Fabrizio Genovese (Mar 26 2020 at 12:49):

Rongmin Lu said:

Bob Coecke Fabrizio Genovese Could you please briefly explain what a meaning space is? My nagging suspicion is that it's an abstraction that's not adequately fit for purpose. It may have been useful as a first-approximation prototype, but there may well be higher structure that's been missed with the underlying assumptions.

I can answer only WRT what we were doing when I was doing it. So for instance you have the sentence "Clowns Tell Jokes". Now "Clowns" and "Jokes" are states, that is, vectors that live in vector spaces that you can build more or less easily by doing thesaurus extraction of whatnot. "Tell" is a verb, so it expects a noun on the left, a noun on the right (which you compose using caps) and spits out a state (hence a vector) in some other vector space, which is the space where sentences live. Clearly one of the main problems there is that no one had a clue about how this space looked like, or how to build it. Another approach we pursued was a cognitive-oriented one, which is the one I used in my PhD thesis. That was conceptually satisfying, but without learning techniques it was basically impossible to build real-life spaces and do something more than toy models

Fabrizio Genovese (Mar 26 2020 at 12:52):

@Bob Coecke claims to have solved this problem in the last paper. The idea of describing verbs as "applying things to nouns" is nice because gets rid of this sentence space altogether and allows you to live only in the spaces you already have but I don't know if it works for any verb. About the functor, the first DisCoCat paper was great because there wasn't any functor, and relationships between grammar/semantics were done using products. So yes, it was a unique category but things were still neatly separated, somehow. I still think that a functor Semantics -> Grammar is useful, not from the NLP point of view, but from a linguistic perspective. It is also a nice point to tackle a lot of problems in Applied Category Theory that pop up pretty much everywhere, but are more pronounced for language, such as the fact that categories are more or less bad to deal with exceptions.

Stelios Tsampas (Mar 26 2020 at 13:25):

Rongmin Lu said:

Fabrizio Genovese said:

categories are more or less bad to deal with exceptions.

Ask a CS person maybe? They know a lot about exceptions. :sweat_smile:

Kleisli categories of relevant monads are excellent ways to deal with exceptions and they often come with convenient enrichments. I'm actually working in those as we speak.

Stelios Tsampas (Mar 26 2020 at 13:28):

Rongmin Lu said:

I still think that a functor Semantics -> Grammar is useful, not from the NLP point of view, but from a linguistic perspective.

I'm not convinced a functor exists, even in a hand-wavy sense. I think you need something less rigid. "Semantics" and "Grammar" probably aren't even categories to begin with.

I'm probably taking this out of context, so I apologize if that's the case. Lawvere theories (which are categories) can model syntax with or without extra equations.

(=_=) (Mar 26 2020 at 13:32):

Stelios Tsampas said:

Rongmin Lu said:

I still think that a functor Semantics -> Grammar is useful, not from the NLP point of view, but from a linguistic perspective.

I'm not convinced a functor exists, even in a hand-wavy sense. I think you need something less rigid. "Semantics" and "Grammar" probably aren't even categories to begin with.

I'm probably taking this out of context, so I apologize if that's the case. Lawvere theories (which are categories) can model syntax with or without extra equations.

Maybe I went overboard with "Grammar", but I feel that part of the reason why "Semantics" is so difficult is that it is very likely not a category. I'm not saying CT is of no use here. Rather, a plain vanilla category is probably not the abstraction you're looking for, and so it's probably not a good idea to ask for a plain vanilla functor.

Stelios Tsampas (Mar 26 2020 at 13:41):

Rongmin Lu said:

Maybe I went overboard with "Grammar", but I feel that part of the reason why "Semantics" is so difficult is that it is very likely not a category. I'm not saying CT is of no use here. Rather, a plain vanilla category is probably not the abstraction you're looking for, and so it's probably not a good idea to ask for a plain vanilla functor.

A plain, vanilla 1-category is most likely not "Semantics", I'll give you just that :P.

Fabrizio Genovese (Mar 26 2020 at 13:45):

Rongmin Lu said:

Anyway, now that I get a rough idea of what these spaces are, here's what I think is missing: recursion. I haven't seen any provision for recursion in this framework, and per Chomsky, this is a key feature of human language. My guess is that you need to provide for recursion in order to handle any expressions less trivial than simple grade 1 sentences, which means exploring more intricate categorical structures. The properads thread may be one place to look. I see polycategories gathering a lot of interest here, and at first glance, these look useful. Monads (and maybe 2-monads) too, perhaps.

I spent an entire chapter in my thesis disagreeing with Chomsky about this, I don't think recursion is a key feature of human language and I don't think chomskian approach to language will get us very far, frankly

Bob Coecke (Mar 26 2020 at 15:10):

@Rongmin Lu @Fabrizio Genovese One can talk endless on what is missing and what not. Evidently, unlike physics, practical NLP is not an exact science. Things make progress by little steps, sometimes taking many centuries...it took Einstein's relativity theory for putting the sun in the middle outperform epicycles! The initial goal of DisCoCat was to account for grammatical structure when considering meanings like those living in vector spaces that were used in practical NLP, which at the time were treated as a bag of words. In that we succeeded I think. Ok, let's talk on what to do next a.k.a. what needs improving (cf planets don't move on circles, but on ellipses, and things don't actually move faster than light). Ok, there is no functor, I agree, and in neither of the directions.

(=_=) (Mar 27 2020 at 08:50):

@Bob Coecke said:

The initial goal of DisCoCat was to account for grammatical structure when considering meanings like those living in vector spaces that were used in practical NLP, which at the time were treated as a bag of words. In that we succeeded I think.

That's what I loved about DisCoCat when I first stumbled upon it: finally, something that can counteract Frederick Jelinek's oft-quoted sentiment that

"Every time I fire a linguist, the performance of the speech recognizer goes up".

I think that might have been true when the problem was coming up with something that could solve a low-level pattern recognition problem in NLP, but now that that's (mostly) done, we need to find better tools to cope with the higher-level NLP problems, and DisCoCat turned out to be a step in the right direction.

Ok, let's talk on what to do next a.k.a. what needs improving (cf planets don't move on circles, but on ellipses, and things don't actually move faster than light). Ok, there is no functor, I agree, and in neither of the directions.

I think DisCoCirc looks interesting, and if you have circuits, operads and friends are probably something to look at. I think semantics and grammar interact both ways, but functors are perhaps too simple to model that interaction. I think we've been ignoring the very important context plays in semantics, both in terms of syntactic context and "embodied" or "environmental" context. For good reasons, I hasten to add: the latter is definitely still a big and messy problem to deal with, even as the former seems to be getting more tractable.

Bob Coecke (Mar 27 2020 at 10:35):

@Rongmin Lu @Fabrizio Genovese Re:
"Every time I fire a linguist, the performance of the speech recognizer goes up".
With his attitude the machines on which he does his speech recognition would never have existed, that's all that needs pointing out to morons like that (and the world is packed with them).

(=_=) (Mar 27 2020 at 12:13):

@Bob Coecke said:

Rongmin Lu Fabrizio Genovese Re:
"Every time I fire a linguist, the performance of the speech recognizer goes up".
With his attitude the machines on which he does his speech recognition would never have existed, that's all that needs pointing out to morons like that (and the world is packed with them).

I think that was the prevailing attitude in practical NLP at the time DisCoCat came out, because Frederick Jelinek pioneered the application of information theory to speech recognition with great success. I think it's only now, with speech recognition being somewhat serviceable but making high-level mistakes, that linguistic knowledge would come to the forefront.

(=_=) (Mar 27 2020 at 12:18):

Fabrizio Genovese said:

I spent an entire chapter in my thesis disagreeing with Chomsky about this, I don't think recursion is a key feature of human language and I don't think chomskian approach to language will get us very far, frankly

Oh dear. TIL. :sweat_smile:

I don't think we disagree. In fact, I really enjoyed your discussion about the Pirahã. I think I was reaching for some concept like interacting feedback loops and landed on recursion instead, sorry.

Fabrizio Genovese (Mar 27 2020 at 12:20):

I really love Everett's approach to linguistics. Language is a tool, and as any tool it is created to solve some problems

Fabrizio Genovese (Mar 27 2020 at 12:22):

This perspective has the virtue of taking into account also the context in which a language develops. For instance, Piraha can be completely whistled, and this most likely has developed as a solution to the problem of hunting and communicating over long distances

Fabrizio Genovese (Mar 27 2020 at 12:24):

Also I think that the approach of having "one monolithic theory that describes every particular instance of something" is a remnant of the XIX century, so in this sense I don't understand Chomsky's effort to differentiate human and animal communication in terms of, say, recursion. The whole sense of the "universal grammar" approach escapes me...

Bob Coecke (Mar 27 2020 at 12:26):

Definitely agree on no clear break between human and animal, but also don't think of language as a tool; what is is the embodiment of it, but the fundamental ontology for me is a representation of reality, and grammar represents the interaction structures of that reality.

Fabrizio Genovese (Mar 27 2020 at 12:29):

But the need of representing reality serves itself a purpose. Many living things don't do this, while humans somehow found a way to represent reality so that these representations can be shared. If this didn't serve any purpose, we would have most likely lost this ability by now

Fabrizio Genovese (Mar 27 2020 at 12:31):

When I say "language as a tool" what I really focus on is this: There are many different languages. some are highly inflected, some aren't inflected at all. Some can be even whistled, others have a very scarce phonetic inventory etc. I don't think these differences arise by chance. Instead, I think they arise because a language with some features is more efficient than another with different features in a given context.

Fabrizio Genovese (Mar 27 2020 at 12:32):

The language of math is probably the greatest example of this. As soon as we are "stuck" at being able to communicate something, we design new mathematical tools to do so. :slight_smile:

Jade Master (Mar 30 2020 at 19:20):

Fabrizio Genovese said:

So the original idea of discocat is "we tag every word with an element of the pregroup (noun, sentence, whatevs) and then we use the pregroup rules to infer the wiring. The reason why you can't do this is because you can prove that the vector space you get is trivial (as Preller did). So we resort to generate this huge category with all the words in it. Then things clearly work, but I guess you lose the original purpose of the idea, that is, separating tags from semantics, and encoding the tagging system in the functor. Now tags, semantics and tagging systems are all more or less conflated in the free CCC you generate.

I'm not sure if I understand this characterization of the free compact closed category. My understanding is that the words are still freely generated by grammatical types, the key difference between this category and a pregroup, is that the free compact closed category is not a preorder (meaning that there can be more than one morphism between pairs of objects). To use @Bob Coecke's favorite phrase, this is a feature not a bug because it allows you to keep better track of the way in which the meanings of sentences are computed.