Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: theory: applied category theory

Topic: DisCoCat and the Grothendieck construction


view this post on Zulip Jade Master (Oct 06 2020 at 18:18):

Fabrizio Genovese said:

Bob Coecke claims to have solved this problem in the last paper. The idea of describing verbs as "applying things to nouns" is nice because gets rid of this sentence space altogether and allows you to live only in the spaces you already have but I don't know if it works for any verb. About the functor, the first DisCoCat paper was great because there wasn't any functor, and relationships between grammar/semantics were done using products. So yes, it was a unique category but things were still neatly separated, somehow. I still think that a functor Semantics -> Grammar is useful, not from the NLP point of view, but from a linguistic perspective. It is also a nice point to tackle a lot of problems in Applied Category Theory that pop up pretty much everywhere, but are more pronounced for language, such as the fact that categories are more or less bad to deal with exceptions.

Something I'd like to point out is the following:

a DisCoCat is often thought of as a monoidal functor

F:CVect F : C \to \mathsf{Vect}

Where CC is the free compact closed category on a set of grammatical types and Vect\mathsf{Vect} is the category of real vector spaces made monoidal with tensor product. (fun exercise btw: prove that a monoidal functor also preserves compact closed structure when it is present) There is some debate about exactly what sort of category CC should be. Instead you could choose CC to be the free pregroup on a set...but as Anne Preller proved a monoidal functor from one of these can only be trivial so the usual next step is to wave your hands and claim that it all morally works.

You can compose FF with the forgetful functor U:VectSetU : \mathsf{Vect} \to \mathsf{Set} to get a functor

UF:CSet U \circ F : C \to \mathsf{Set}

Functors of this sort are equivalent to discrete opfibrations

XC X \to C

via the Grothendieck or "category of elements" construction

:[C,Set]DiscOpFib(C) \int : [C, \mathsf{Set}] \xrightarrow{\cong} \mathsf{DiscOpFib(C)}

I think that (UF)\int (U \circ F) is morally what Prof Coecke is talking about when talks about one category with the grammar and semantics in the same place. The objects of (UF)\int (U \circ F) are pairs (x,v)(x,v) where xx is a product of grammatical types and vv is a vector in the tensor product of their meaning spaces.

So my point is that thinking of the grammar and semantics in one place doesn't really fix the issue about having the grammar category be a pregroup. Every property in the definition of "monoidal functor" corresponds to some desirable property of "the category with grammar and semantics in the same place".

So if you give up the approach of seeing FF as a monoidal functor, then you must give up the above things as well!

view this post on Zulip Fabrizio Genovese (Oct 06 2020 at 19:58):

The problem is that there is no notion of functor FF, to my knowledge, for which CC is a satisfying notion of grammar

view this post on Zulip Fabrizio Genovese (Oct 06 2020 at 19:59):

This is exactly because of what Preller showed, as you pointed out. On the contrary, building CC as freely generated over a dicrionary (as Preller and Lambek themselves suggest as a fix) is simply as far as you can get from what a linguist means when they say "grammar"

view this post on Zulip Fabrizio Genovese (Oct 06 2020 at 20:00):

I am very sure that grammar should emerge from semantics, and not the other way around. That is, if one wants to chase a functor, it should go from Vect\mathbf{Vect} to CC, for some (pregroup?) CC.

view this post on Zulip Fabrizio Genovese (Oct 06 2020 at 20:01):

I have some intuition about how to do it. Someone I know is planning to write her master thesis on this tho, so I'm not making any progress until then. :smile:

view this post on Zulip Fabrizio Genovese (Oct 06 2020 at 20:02):

(Everything is obviously made more complicated by the fact that I'm not in academia, so supervision is more complicated, but that's something for another time)

view this post on Zulip Jules Hedges (Oct 06 2020 at 20:07):

Fabrizio Genovese said:

simply as far as you can get from what a linguist means when they say "grammar"

Why? Morphisms of the free autonomous category are grammatical derivations, right? Sounds like grammar to me

view this post on Zulip Jade Master (Oct 06 2020 at 20:12):

Fabrizio Genovese said:

This is exactly because of what Preller showed, as you pointed out. On the contrary, building CC as freely generated over a dicrionary (as Preller and Lambek themselves suggest as a fix) is simply as far as you can get from what a linguist means when they say "grammar"

I don't think that's true. If you lose the compact closed structure you can encode reduction rules for a grammar by encoding them into a pre-net.( i.e. if a*b reduces to c then include a transition ab -> c). The free monoidal category on this pre-net gives a reasonable (but bare-bones) model of a grammar. A monoidal functor from this grammar to Vect\mathsf{Vect} gives a semantic interpretation of this grammar.

view this post on Zulip Jade Master (Oct 06 2020 at 20:17):

Maybe the free monoidal category isn't fancy enough to be satisfying and I understand that. Regardless I think it's a useful conceptual framework.

view this post on Zulip Chad Nester (Oct 06 2020 at 20:19):

To interject with a question that is only tangentially related and possibly unwelcome...

Why is Vect\mathsf{Vect} a reasonable semantic domain for natural language?

I’ve seen a lot of talks on DisCoCat and this has never made sense to me, but it’s never felt appropriate to ask.

view this post on Zulip Jules Hedges (Oct 06 2020 at 20:21):

aka "the category of things that go fast on your graphics card"

view this post on Zulip Jade Master (Oct 06 2020 at 20:21):

That's a reasonable question which is welcome to me at least.

view this post on Zulip Jules Hedges (Oct 06 2020 at 20:24):

Because the dimension is pushed as high as you possibly can, anything nonlinear is probably impossible to actually run

view this post on Zulip Jade Master (Oct 06 2020 at 20:24):

Lot's of people like @Martha Lewis think it's not the best choice (Is that right Martha?) If I remember correctly, the idea is that a better category for semantics might be something like the category of convex relations. You can find more info about that in this paper: https://arxiv.org/abs/1703.08314

view this post on Zulip Jade Master (Oct 06 2020 at 20:27):

Jules Hedges said:

Because the dimension is pushed as high as you possibly can, anything nonlinear is probably impossible to actually run

This complaint I'm not sure I get. DisCoCat isn't supposed to be something you directly turn into code right? My understanding is that it's a way to frame the mathematical problem which might inspire code within that framework.

view this post on Zulip John Baez (Oct 06 2020 at 20:30):

I think prematurely stuffing something into the framework of linear algebra, before you've figured out what the math should be, is dangerous. But it's also extremely common.

view this post on Zulip John Baez (Oct 06 2020 at 20:31):

I think it's better to slow down a bit, figure out the right math for a subject, and then figure out how to do approximations that make it efficient to compute.

view this post on Zulip John Baez (Oct 06 2020 at 20:32):

For example gauge field theory was formulated long before people started finding efficient algorithms in "lattice gauge theory" to do the computations - like, compute the mass of the proton from first principles!

view this post on Zulip John Baez (Oct 06 2020 at 20:33):

But I think in a subject like linguistics people are so unsure that there is a "right" way to do things that they're more likely to seek a computationally convenient framework just to get anything to work at all.

view this post on Zulip Chad Nester (Oct 06 2020 at 20:35):

I think the issue I’m having with Vect\mathsf{Vect}, which seems to persist if we move to convex relations, is that this picture of language is completely divorced from reality (in a literal sense).

view this post on Zulip John Baez (Oct 06 2020 at 20:35):

Do you know a framework that's not "completely divorced from reality"?

view this post on Zulip Chad Nester (Oct 06 2020 at 20:39):

I’ve encountered ideas of what language is that are not, at least in the sense I mean, and frankly i don’t see how it could be any other way: language describes.

view this post on Zulip Chad Nester (Oct 06 2020 at 20:39):

I can’t give you a mathematical framework for this sort of thing, although I do think this is a very important question.

view this post on Zulip Chad Nester (Oct 06 2020 at 20:40):

I don’t think “It’s all we have!” Is a terribly good justification.

view this post on Zulip John Baez (Oct 06 2020 at 20:45):

Neither do I, but if you're saying one approach to linguistics "completely divorced from reality" I'm curious if you like the other approaches better: there are lots.

view this post on Zulip John Baez (Oct 06 2020 at 20:45):

I don't think the main purpose of language is to "describe", by the way.

view this post on Zulip John Baez (Oct 06 2020 at 20:45):

I believe people talk mainly to get other people to do things.

view this post on Zulip John Baez (Oct 06 2020 at 20:47):

But anyway, "the main purpose(s) of language" is something linguists have been discussing for a long time.

view this post on Zulip Chad Nester (Oct 06 2020 at 20:49):

I certainly don’t know what the “main purpose” (does that even make sense?!) of language is, but it must in any case “describe”. How else would you communicate what you want someone to do, for example?

view this post on Zulip Jules Hedges (Oct 06 2020 at 20:50):

I always thought of DisCoCat as natural language processing (aka NLP) rather than linguistics. Although I think different people have different ideas on that distinction

view this post on Zulip John Baez (Oct 06 2020 at 20:50):

Well, I'm saying that any approach to language that can't handle utterances like

"Wow!"

or

"Get out of my face!"

is not really handling language as spoken by ordinary people.

I don't think these utterances are mainly descriptive.

view this post on Zulip Chad Nester (Oct 06 2020 at 20:51):

That’s a good point.

view this post on Zulip Chad Nester (Oct 06 2020 at 20:53):

@Jules Hedges fair enough

view this post on Zulip Jules Hedges (Oct 06 2020 at 20:54):

Think of it as a physicist. A model that's useful only in some restricted domain can still be useful

view this post on Zulip Chad Nester (Oct 06 2020 at 20:55):

I see that from that angle my objections are pretty unfair haha.

view this post on Zulip Morgan Rogers (he/him) (Oct 06 2020 at 20:56):

In linguistics, the technical area studying "what (a given instance of language) is for" falls under pragmatics. This is an echelon of language comprehension above semantics; if semantics describes the information content of a sentence on its own, then it's bound to take some extra work to incorporate the extra information contained in the sentence's surroundings.

view this post on Zulip Jules Hedges (Oct 06 2020 at 20:57):

I guess NLP benchmark datasets just don't contain interjections like that or other ""weird"" stuff. (Or only some of them do so you get to ignore those ones). I don't know whether that's because the benchmark datasets only contain the stuff that you need for the applications they have in mind (I think typically stuff like automatic sentiment analysis), or rather that nobody bothered to include stuff in the datasets that nobody had any idea how to handle

view this post on Zulip Morgan Rogers (he/him) (Oct 06 2020 at 21:00):

[Mod] Morgan Rogers said:

In linguistics, the technical area studying "what (a given instance of language) is for" falls under pragmatics.

That page actually contains a lot of aspects of language that look hard to tackle mathematically! NLP has its work cut out...

view this post on Zulip Chad Nester (Oct 06 2020 at 21:04):

Forgetting for a moment our logical training, isn’t it sort of bizarre to consider an utterance “out of context” at all?

view this post on Zulip Morgan Rogers (he/him) (Oct 06 2020 at 21:06):

It seems like a natural starting point, no?

view this post on Zulip Chad Nester (Oct 06 2020 at 21:07):

Personally I suspect that if we start there then we’ve already lost so much of what language is “about” that we’re taking about something else. Syntax with no semantics.

view this post on Zulip Chad Nester (Oct 06 2020 at 21:08):

(I do realise that in this direction lies quite a lot of what we all do!)

view this post on Zulip John Baez (Oct 06 2020 at 21:10):

Mathematics is an attempt to create an idealized language that's mainly descriptive - sentences are supposed to be true or false - and where the sentences make sense out of context, except for the context of the overall system they inhabit, like "the language of Peano arithmetic".

view this post on Zulip John Baez (Oct 06 2020 at 21:11):

These simplifications make it easier to understand mathematical language mathematically than ordinary language.

view this post on Zulip John Baez (Oct 06 2020 at 21:12):

And indeed one of the reasons for "regimenting" mathematical language this way was to make it to make it easier to study mathematically! (Meta-mathematics.)

view this post on Zulip John Baez (Oct 06 2020 at 21:12):

While this is great, programming languages bust out of this framework (e.g. imperative languages can say "do this", and have a richer system of contexts) and human language busts out of it even more.

view this post on Zulip Morgan Rogers (he/him) (Oct 06 2020 at 21:14):

Chad Nester said:

Personally I suspect that if we start there then we’ve already lost so much of what language is “about” that we’re taking about something else. Syntax with no semantics.

We don't remove context completely; rather, we break things down to the basic building blocks (usually individual words) and analyse those pieces. They acquire relative meaning from their relationships to one another; a Vect-model can learn from a relatively small dataset that amongst nouns, "cat" and "dog" are semantically similar, because they turn up around similar words and phrases, even without having a model of what "real" things those words might be attached to.
What's missing is the interactions with non-language, and by their very nature such interactions are difficult to fit into a formalism describing language.

view this post on Zulip Chad Nester (Oct 06 2020 at 21:18):

It’s just so.... flat :o

view this post on Zulip Chad Nester (Oct 06 2020 at 21:18):

(I have to sign off for now but will check back! — thanks everyone for the discussion so far!)

view this post on Zulip Morgan Rogers (he/him) (Oct 06 2020 at 21:26):

John Baez said:

While this is great, programming languages bust out of this framework (e.g. imperative languages can say "do this", and have a richer system of contexts)...

This is a great intermediate example that could be helpful: how does one handle the context in computer science? One has the advantage that the interaction between a computer and the outside world is very controlled: there are inputs and outputs, and we know in what form the information will enter and leave through these; in particular there are manageable limits on these. Natural language has the whole world under its purview, so it's an overwhelming task to try to handle the context as if they were just inputs and outputs, but structurally it's not too much of a stretch. We just need to decide which ingredients to start with and not be too ambitious.

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 10:12):

Jules Hedges said:

Fabrizio Genovese said:

simply as far as you can get from what a linguist means when they say "grammar"

Why? Morphisms of the free autonomous category are grammatical derivations, right? Sounds like grammar to me

A grammar generated over a dictionary is not a grammar. It's literally taking all the language and calling it a grammar.

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 10:13):

Chad Nester said:

To interject with a question that is only tangentially related and possibly unwelcome...

Why is Vect\mathsf{Vect} a reasonable semantic domain for natural language?

I’ve seen a lot of talks on DisCoCat and this has never made sense to me, but it’s never felt appropriate to ask.

I don't think it is. But it allows you to build meaning out of statistic correlations between words, which is something you can extract automatically from data. All in all, vector spaces are one of the few categories that allow you to build a semantics that is not a toy.

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 10:18):

In any case, when it comes to fields like linguistics, I think we should save content over categories. Yes, taking "free categories" solves things categorically. But people have been thinking formally about what language is since Ferdinand De Saussurre in the beginning of the last century. Basically giving up all that just because you want a compact closed category around is short-sighted (if not insulting for a ton of people having worked in that field before).

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 10:21):

With this I mean that either we do NLP, as Bob did in his last paper, and give up on this "grammar -> semantics" functor altogether, or we don't, but in this case we should maybe give up at calling "grammar" something that is not a grammar. The whole point of using pre-groups is that they have been shown, if I recall correctly, to be equivalent to context-free grammars. The great advantage of DisCoCat was exactly that it took something linguists liked very much (context free grammars) and related it to something computer scientists doing NLP liked very much (vector spaces). If you have to replace pre-groups with something else, frankly all the niceness of the framework goes away in my opinion.

view this post on Zulip Morgan Rogers (he/him) (Oct 07 2020 at 12:03):

I would already be putting "syntax" as the domain of the functor rather than grammar. Would that be an acceptable compromise to you @Fabrizio Genovese, since syntax can take many forms?

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:05):

It would be, but I think it is fundamental to ask "what is syntax in the context of language"?

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:05):

And well, the first to give an answer was Panini 2500 years ago, roughly

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:05):

He even invented pushdown automata in the process, so we could say that his answer was already quite formal.

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:08):

What I am trying to say is that any compromise, to be acceptable, should entail studying all (or some) of these efforts, and trying to recast them with category theory. It is very easy to throw away the baby with the bath water otherwise, which is exactly what I think happens when one ditches pre-groups for free compact closed categories.

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:09):

So, when you say "syntax", the natural question should be "does you definition of syntax actually say or capture something interesting that happens in language?"

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:10):

John already pointed out some limitations of the context-free setting, namely, the difficulty of modelling things such as utterances, interjections and the like.

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:11):

One of the problems there, imho, is that such elements are more context sensitive than other common elements of language. "Wow" can be used to mean sincere amazement or that you are seriously pissed off about something. This is very context-dependent, and it is not surprising that you'll have an hard time describing this with context-free grammars. So yes, there's definitely margin for improvement

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:14):

I can think about a ton of things that happen in language that are really difficult to capture formally, even if endeavors have been made: Sentence tone and rythm, pitch accent, the synthetic Vs analythic characteristics of a languages, all the way up to corner cases such as Piraha which does not have recursion, and hence cannot really be modelled by a context-free language or anything more powerful than that

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:17):

This entails another big problem: It is difficult to say what is the "syntax of language" without saying what is language. And, at least in my opinion, all the answers given to this question are quite anglo-centric, or at best western-centric. So we should also keep in mind that, in defining what "grammar" would even be, even categorically, we are starting from a very biased perspective. The reason why stuff like tones and accents are difficult to capture by "traditional" formal methods is that these are not features that are heavily present in English, and so may appear to be of secondary importance to a indo-european speaker

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:20):

All in all, I think that focusing on "how can I make this into a functor" is not the right kind of question. There is quite a lot of stuff to study before that, and quite a lot of philosophical, conceptual reasoning to make before even trying to set up the problem, depending mainly on the degree of generality you want to operate. :smile:

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:26):

Anyway, if we consider grammar in a traditional sense, I am confident that the functor should go the other way around. There are a lot of reasons for this, but in general the idea of using grammar to "orchestrate meaning of language" as in DisCoCat is made complicated by the fact that you can bend grammar pretty much in any way you like and still convey meaning. "I hungry please food now" is totally understandable yet not grammatical.

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 12:27):

Grammar, as a "generally useful description of how parts of language interact with each other" should emerge from syntax and meaning "up to epsilon", not the other way around. This is why I'm critical of the functorial approach. I think it is too stiff.

view this post on Zulip Morgan Rogers (he/him) (Oct 07 2020 at 12:59):

Fabrizio Genovese said:

Anyway, if we consider grammar in a traditional sense, I am confident that the functor should go the other way around. There are a lot of reasons for this, but in general the idea of using grammar to "orchestrate meaning of language" as in DisCoCat is made complicated by the fact that you can bend grammar pretty much in any way you like and still convey meaning. "I hungry please food now" is totally understandable yet not grammatical.

I don't get how this is an argument for the functor going the other way. The "syntax -> semantics" or "grammar -> semantics" functor is one that translates [instance of language] into "meaning of [instance of language]". Determining precisely where (in what category, say) "meaning" should live is a big hard question, but even in day to day experience we find plenty of experiences, i.e. "meaning", that language is not equipped to express, so a mapping from meaning to language doesn't seem like a feasible direction for this relationship to go. What did you have in mind?

view this post on Zulip Fabrizio Genovese (Oct 07 2020 at 13:04):

The key here is "up to epsilon", I'm DMing you if that's ok

view this post on Zulip Robin Piedeleu (Oct 07 2020 at 13:28):

Chad Nester said:

Why is Vect\mathsf{Vect} a reasonable semantic domain for natural language?

I’ve seen a lot of talks on DisCoCat and this has never made sense to me, but it’s never felt appropriate to ask.

As far as I understand it grew out of a partially-verified empirical claim about the way that words are distributed in text. Others here know this better than I do and should correct me if I'm wrong.

The idea is that, from large corpora of texts, you can compress essential language statistics down to vector representations for single words. One easy way is to compute a big array whose w,ww,w'-entry is the number of times that word ww and ww' appeared together for some predefined notion of context (usually, a window of surrounding words or the same sentence, etc.). After some post-processing step like PCA or SVD you obtain a decomposition of this matrix into more manageable factors from which you can get a vector for each word in your corpus. These days, from what I understand, language models (GPT-3 for example) are generative models: they model the probability distribution of sequences of words. Usually this is obtained by training a (deep) neural net to predict the occurrence of words in context. Then, the vector representation of a word is extracted from parameters of the neural network used to approximate the distribution.

You might object that these are just arrays of numbers and not really vectors, since we make no use of the linear structure. And Discocat appeals to linearity in a crucial way, in the sense that the meaning of a sentence is computed by contracting a bunch of tensors, representing the meaning of individual words or phrases. But where do these tensors come from? And why does it make any sense to compose them following the shape of a grammatical derivation to compute the meaning of sentences? Here, Discocat relies on the assumption that, somehow, the vector representations obtained from the statistical data espouses the shape of the grammar. But this is a non-trivial empirical claim. It turns out that it is partially justified, for very simple cases where the vector representations capture some semantic relationships between words as linear dependencies, e.g., vkingvman+vwomanvqueenv_{king} - v_{man} + v_{woman} \approx v_{queen}. I think this suffices to justify the way that certain simple adjectives are handled in Discocat and some of the papers have shown how to handle several toy examples. But to justify the way that the meaning of more complex sentences is derived, one would need to make sure that the distributional model contains much more complex higher-rank linear dependencies in order to guarantee that, say, a transitive verbs acts like a bilinear map, as it is supposed to. I know that people have worked on building (or discovering) these higher-rank tensors from statistical data but imho much more is still needed to justify the underlying assumptions that it makes sense to compute the meaning of sentences using this form of composition (a functor from grammar to linear semantics).

Of course, another route is to change the model completely, going beyond distributional data to more structured representations.

view this post on Zulip Chad Nester (Oct 08 2020 at 06:45):

@Robin Piedeleu Thanks for this answer!

view this post on Zulip Valeria de Paiva (Oct 08 2020 at 16:48):

Jules Hedges said:

I guess NLP benchmark datasets just don't contain interjections like that or other ""weird"" stuff. (Or only some of them do so you get to ignore those ones). I don't know whether that's because the benchmark datasets only contain the stuff that you need for the applications they have in mind (I think typically stuff like automatic sentiment analysis), or rather that nobody bothered to include stuff in the datasets that nobody had any idea how to handle

This is NOT true! The Universal Dependencies framework (worked by Stanford and Google and more than 300 individual contributors--UD is an open community effort with over 300 contributors producing more than 150 treebanks in 90 languages. https://universaldependencies.org/ ) certainly has things like Wow and Hey and everythign else that it's difficult to model.

view this post on Zulip Valeria de Paiva (Oct 08 2020 at 17:09):

Robin Piedeleu said:

Chad Nester said:

Why is Vect\mathsf{Vect} a reasonable semantic domain for natural language?

I’ve seen a lot of talks on DisCoCat and this has never made sense to me, but it’s never felt appropriate to ask.

As far as I understand it grew out of a partially-verified empirical claim about the way that words are distributed in text. Others here know this better than I do and should correct me if I'm wrong.

Great job summarizing the suggested explanation @Robin Piedeleu !
but personally, I say that it isn't a reasonable semantic domain for natural language at all.

the proposed explanation in terms of Curry-Howard correspondence doesn't make sense to me: compact closed categories have no sense of negation (because conjunctions, disjunctions and implications are all the same using the Curry-Howard correspondence in compact closed setups like Vect) and natural language is all about entailments and contradictions (which require negations and differences between the basic connectives!).
NLP in real life needs to pay attention to coverage (any theory whatsoever can do ten simple sentences, the problem is to do the NYT!) and as Robin says the vectors work very well for measuring similarities, but that's about all, IFAIC.

view this post on Zulip Robin Piedeleu (Oct 09 2020 at 15:32):

Thanks @Valeria de Paiva!

I agree with your remark about logic in compact-closed categories. However, as I understand it, Discocat places logic at a different level: logical connectives should just be words and therefore, their interpretation is not at the type level but at the semantic level. Concretely, negation should be a linear endomap on the sentence space, conjunction a bilinear maps etc. Of course it is really unclear exactly which maps they should be, which is why your objection remains completely valid: there is no easy way to accommodate the usual logical connectives in the standard distributional semantics. In fact, I think this was one of @Bob Coecke's main reasons to change the semantics to something more structured, the first candidate being density matrices, inspired by quantum physics.

Your comment also reminded me of an important point I failed to problematise earlier: how do we evaluate what a reasonable semantics for natural language is? As someone who is interested in NLP and linguistics mostly from the outside, I have no idea. The traditional NLP approach seems to be all about improving scores at performing certain tasks, like reference resolution, question answering, relationship extraction, fooling people into thinking they are reading content produced by another human, what have you. The modern answer is that generative models can perform really well at all of these tasks and more (provided that you have enough computing power to train humongous neural nets) so that getting a good model of the distribution of word sequences is a good semantics (if we stay within the realm of text processing at least---I know nothing about speech processing). Discocat seems to be focused on something else: in Discocat, semantics captures similarity of text fragments, e.g., two sentences have similar meaning if they represent close vectors in sentence space, whatever that is. This seems sensible but it is not at all obvious to me how we can use it to perform traditional NLP tasks better. But maybe that is not the aim. Maybe we're just trying to understand better how language works, which aspects are genuinely compositional vs. which ones are not (but then, can that ever be divorced from predictive power?). I'm rambling now so I'll stop, but I would be happy to hear the opinion of people who work directly on this (@Alexis Toumi ?).

view this post on Zulip Bob Coecke (Oct 09 2020 at 15:51):

DisCoCat has nothing directly to do with Vect cf the first papers where we also used Real as an example model, and later papers with conceptual spaces, density matrices etc. See e.g. also this more recent paper https://arxiv.org/abs/1904.03478 where framework and model are clearly distinct. That said, FVect is where most current practical NLP takes place, so ignoring that model would be stupid. For quantum computing the arguments is even stronger.

view this post on Zulip Bob Coecke (Oct 09 2020 at 15:56):

This is the real fun you get out of working with vector spaces. This is a general comment about ACT, just don't talk about applied stuff, but do it!
https://medium.com/cambridge-quantum-computing/quantum-natural-language-processing-748d6f27b31d

view this post on Zulip Valeria de Paiva (Oct 09 2020 at 20:08):

Robin Piedeleu said:

Your comment also reminded me of an important point I failed to problematise earlier: how do we evaluate what a reasonable semantics for natural language is? As someone who is interested in NLP and linguistics mostly from the outside, I have no idea. The traditional NLP approach seems to be all about improving scores at performing certain tasks, like reference resolution, question answering, relationship extraction, fooling people into thinking they are reading content produced by another human, what have you. The modern answer is that generative models can perform really well at all of these tasks and more (provided that you have enough computing power to train humongous neural nets)

Well, I beg to differ again. the generative models do NOT perform really well at all of these tasks, without a huge hedge. They perform really well within their training: there's a collection of literature showing that they cannot generalize, that their impressive results go down the drain with tiny adversarial modifications that show that they did not understand the contents at all--some of this new literature is collected in this blog post (https://logic-forall.blogspot.com/2020/03/artifacts-in-nlp.html) from March 2020. but my checking of the literature was done in Nov 2019 and since then the number of papers explaining the failings of the so-called "super-human" performance has increased.

view this post on Zulip Jade Master (Oct 10 2020 at 22:32):

Bob Coecke said:

DisCoCat has nothing directly to do with Vect cf the first papers where we also used Real as an example model, and later papers with conceptual spaces, density matrices etc. See e.g. also this more recent paper https://arxiv.org/abs/1904.03478 where framework and model are clearly distinct. That said, FVect is where most current practical NLP takes place, so ignoring that model would be stupid. For quantum computing the arguments is even stronger.

Hi Dr Coecke, nice to see you. Do you have any comment regarding what I said at the beginning of this topic?

view this post on Zulip Bob Coecke (Oct 11 2020 at 11:03):

Jade Master said:

Bob Coecke said:

DisCoCat has nothing directly to do with Vect cf the first papers where we also used Real as an example model, and later papers with conceptual spaces, density matrices etc. See e.g. also this more recent paper https://arxiv.org/abs/1904.03478 where framework and model are clearly distinct. That said, FVect is where most current practical NLP takes place, so ignoring that model would be stupid. For quantum computing the arguments is even stronger.

Hi Dr Coecke, nice to see you. Do you have any comment regarding what I said at the beginning of this topic?

Hi Jade, this is exactly what we did in the 1st paper:
http://www.cs.ox.ac.uk/people/stephen.clark/papers/qai08.pdf
and I agree (for many reasons) that this may be the better view. The only reason we changed is that people seemed to prefer the functor representation.

view this post on Zulip Jade Master (Oct 11 2020 at 15:38):

Bob Coecke said:

Jade Master said:

Bob Coecke said:

DisCoCat has nothing directly to do with Vect cf the first papers where we also used Real as an example model, and later papers with conceptual spaces, density matrices etc. See e.g. also this more recent paper https://arxiv.org/abs/1904.03478 where framework and model are clearly distinct. That said, FVect is where most current practical NLP takes place, so ignoring that model would be stupid. For quantum computing the arguments is even stronger.

Hi Dr Coecke, nice to see you. Do you have any comment regarding what I said at the beginning of this topic?

Hi Jade, this is exactly what we did in the 1st paper:
http://www.cs.ox.ac.uk/people/stephen.clark/papers/qai08.pdf
and I agree (for many reasons) that this may be the better view. The only reason we changed is that people seemed to prefer the functor representation.

My point is that neither the functor approach or the product space approach is better because they are equivalent. The issues with the functor approach will also show up in the product space approach.

view this post on Zulip Bob Coecke (Oct 11 2020 at 19:52):

Jade Master said:

Bob Coecke said:

Jade Master said:

Bob Coecke said:

DisCoCat has nothing directly to do with Vect cf the first papers where we also used Real as an example model, and later papers with conceptual spaces, density matrices etc. See e.g. also this more recent paper https://arxiv.org/abs/1904.03478 where framework and model are clearly distinct. That said, FVect is where most current practical NLP takes place, so ignoring that model would be stupid. For quantum computing the arguments is even stronger.

Hi Dr Coecke, nice to see you. Do you have any comment regarding what I said at the beginning of this topic?

Hi Jade, this is exactly what we did in the 1st paper:
http://www.cs.ox.ac.uk/people/stephen.clark/papers/qai08.pdf
and I agree (for many reasons) that this may be the better view. The only reason we changed is that people seemed to prefer the functor representation.

My point is that neither the functor approach or the product space approach is better because they are equivalent. The issues with the functor approach will also show up in the product space approach.

I see. Well, then again, :) at the moment I typically first present the diagrams, and then fill them in with models, which avoids the problem, right? In some forthcoming paper with Vincent Wang we are moving entirely away from pregroups. In practice, you only used 1/2 of the pregroup for the grammatical structure anyway (cf only cups, no caps). But you are very right to use the world "morally", cause really, that's what the functor/pairing provided, some moral.

view this post on Zulip Jade Master (Oct 11 2020 at 20:32):

Oh great. I'm excited to see what y'all do.

view this post on Zulip Robin Piedeleu (Oct 12 2020 at 11:08):

Valeria de Paiva said:

Well, I beg to differ again. the generative models do NOT perform really well at all of these tasks, without a huge hedge. They perform really well within their training: there's a collection of literature showing that they cannot generalize, that their impressive results go down the drain with tiny adversarial modifications that show that they did not understand the contents at all--some of this new literature is collected in this blog post (https://logic-forall.blogspot.com/2020/03/artifacts-in-nlp.html) from March 2020. but my checking of the literature was done in Nov 2019 and since then the number of papers explaining the failings of the so-called "super-human" performance has increased.

Thanks for your reply and for the very interesting blog post! It is reassuring to see concrete evidence that distributional features obtained from generative models are insufficient to capture essential structural properties. It seems that logic still has an important role to play in NLP. Your post also points towards an answer to the question I (and implicitly @Chad Nester) asked before: how do we decide what is a reasonable semantics for natural language? You seem to suggest that possible answers are semantics that provide a basis for effective inference. I like the idea of inference as the mother of all NLP tasks! I wonder how far Discocat can be pushed in this direction. I know there is already some work on using density matrix representations for this purpose but, as @Bob Coecke pointed out, the framework is very flexible and could theoretically accommodate much more structured semantics.

view this post on Zulip Valeria de Paiva (Oct 13 2020 at 18:46):

Robin Piedeleu said:

I like the idea of inference as the mother of all NLP tasks!

So do I! Thanks!
But the other important criteria is really *coverage*. only ten sentences any theory can be made to do.