You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
Here are two analogies:
"man is to woman as king is to queen"
and
"king is to man as queen is to woman"
There's a vector space model of concepts where these are written as equations
and
You can draw the four differences here as the four sides of a parallelogram. The two equations are equivalent because a vector space is an abelian group. In a nonabelian group
is not equivalent to
So my main question is: are analogies generally 'abelian', or can you think of examples where
is to as is to
but not
is to as is to .
There's a general notion of [[heap]], which sheds some light on this. A heap has a ternary operation where answers the question " is to as what is to ?" Any group gives a heap where
and any heap is isomorphic to one coming from a group. But abelian groups give heaps satisfying the 'abelian' law
So I'm wondering if analogies sometimes break this law.
Yes, it is the consensus among cognitive scientists that analogies are generally commutative (rotatable).
Consider a letter-string analogy problem
"abc" :: "abd"
"ijk" :: ???
Such analogy problems have many spurious solutions (see Melanie Mitchell's introduction). It's famous folklore that in a conversation between D. Hofstadter and R. Feynman, the latter insisted that the right solution is "abd" by Ockham-style reasoning. The counterarguments to Feynman's solution already involved some commutative square considerations (Feynman's reasoning would yield a different solution to the symmetric`abc :: ijk ::: abd :: ?). Fluid Concepts and Creative Analogies, the first book sold on Amazon, deals with this in depth: the "dissatisfaction" value of Chapter 5 is essentially a measure of failure to commute when composing two relations.
For solving such analogies, some traditional AI systems try to synthesize pairs of functions, f,g: String -> String
, where f("abc") = "abd"
and g("abc") = "ijk"
hold. A solution to the analogy is found when both (g o f)("abc")
and (f o g)("abc")
produce the same result. And a commutative square is a commutative square no matter which way you look at it.
This is not limited to toy domains like letter-string analogies, though. In the 2010s, Excel's generative autofill (FlashFill) was implemented using the very same principle (using each rows as part of an "analogy" in an action synthesis problem). Some current close-to-SoTA solutions to Chollet's Abstraction and Reasoning Challenge (ARC) also use a variant of the same idea: however, this application requires a very very good program synthesizer. A good LLM can mostly do the job, but you can't make the ARC leaderboard if you call out to an external LLM.
Moreover, this principle explains a part of the early success that word2vec-style embeddings and vector differences had in forming natural language analogies (king/queen//man/woman etc.), and how they relate to the earlier search-based GOFAI approaches: they solve an analogya :: b ::: c :: ?
by "synthesizing" the very simple programs and , which commute by default. In contrast, it's really quite difficult to find commuting programs using unconstrained genetic programming or other common kinds of search.
(Cogsci literature on such topics is kinda hard to read, but I found it worth engaging with: they did think about things a lot, and the existing literature contains most ideas one would naively have in several cryptomorphic forms)
Why should one think that the alternative solutions ("ijl", say) are spurious rather than thinking that there are just several good answers?
@Evan Washington There _can_ be several good answers. The domain experts tend to agree that "ijd" is not a particularly good answer for this particular problem compared to "ijl", and they have written books of arguments about why they think that and what good solutions are (including a fair bit of empirical research about which letter-string analogies people find compelling). (More importantly, I misquoted Feynman's solution, which was actually that "ijk" maps to "abd" too (fixed now in edit), which is much more clearly spurious than "ijd".)
IIRC it was an interesting exercise to write down two short commuting Python programs which realize ("abc"->"abd", "abc"->"ijk") with their composites sending "abc"->"ijd", but I did it over 10 years ago so I can't vouch for that. The ones that realize "ijk" -> "ijl" are much easier.
The analogy below, however, is definitely nice to contemplate (especially if one has seen similar pushout/pullback squares).
nonlinguistic.png
Interesting stuff! We can write the first analogy as and . That is, there is some process which produces both from and from . The second analogy can be written as and . I considered the special case where we can think of the first process as "evaluating a property ". So, I write the first analogy as and . The idea is that the -th property of is and the -th property of is .
Using this notation, then can be written as . Since , this can be rewritten as . That is, the process "preserves the property ".
We can always achieve this situation by defining to be , after setting . And this seems to give reasonable analogies in some examples. For example, consider the analogy "tree is to leaf as body is to hand". We have tree.p=leaf and body.p=hand. Then we set g(tree)=body, and to finish our second analogy we need also to have g(tree.p)=body.p. We can achieve this by setting g(tree.p)=g(leaf)=hand. And so our second analogy ends up being "tree is to body as leaf is to hand", which seems reasonable.
In general, it seems plausible to expect that if x is analogous to y, then the pth part of x will be intuitively analogous to the pth part of y. I think this relates to the fact that we can often "swap kinds of things" and "swap between the whole and a specific part of things" in either order and get the same answer.
John Baez said:
So my main question is: are analogies generally 'abelian', or can you think of examples where
is to as is to
but not
is to as is to .
This is quite an interesting question. Given that the answer may also depend on data, context, and culture, there is a lot of room for debate. Therefore, I am not sure how my examples below will be received, but I am trying to throw out some ideas.
Suppose that you define:
such that: A car is to dad as flowers are to mom.
But because cars are expensive, we could also have:
such that: In all practicality, socks are to dad as flowers are to mom.
Now, if the structure above were Abelian, we would have . The problem is that we cannot represent as the expression:
because is contextual.
If this is of any interest to anyone, I have studied these questions in commutative idempotent monoids (i.e. where you have ) in this paper. There, I provide a generalized linear algebra for commutative idempotent monoids to solve these kinds of questions (by defining an imaginary subtraction operation where ). My goal was to untangle complex relationships between haplotypes and genes in populations over generations. For these problems, I was handling the contextuality of these relations by using some sheaf-like functors on local parts of DNA.
[Thanks for linking that paper @Rémy Tuyéras! I still have a very long ways to go, but one of my main long term goals is to do math that studies the process of medical imaging reconstruction. I anticipate that this will involve thinking deeply about what it means to "untangle complex relationships". In my case this would involve studying the relationships between observations obtained in different ways (e.g. as we perturb the thing to be imaged differently, or as we take observations on different sensors or at different times). I assume that this medical imaging direction is quite different from what you describe in your paper. Nonetheless, I am still excited to someday read your paper to begin understanding some of the modern mathematical techniques and ideas that can be used to study complex relationships!]
here's an example (perhaps) of an analogy breaking the abelian law. in the (undirected) path graph , it seems true that as (there is an edge between each). but it seems false that as (there is an edge between and but not and ).
and here's a natural language example (which you may find less plausible). Father is to father's father (paternal grandfather) as mother is to mother's mother (maternal grandmother), but it seems like it's not the case that father is to mother as paternal grandfather is to maternal grandmother. (It seems better, to me, to say that father is to mother as paternal grandfather is to paternal grandmother.)
That's a cool paper @Rémy Tuyéras ! I look forward to reading it more. Hopefully you won't mind some possibly superficial reflections.
It seems like you're pointing out that unlike in a simple binary operator like a group, the category may have more than one object, and so moving along morphisms in certain directions requires one to specify more information for unique inverses.
So,Paris : France :: ____ :: USA
could be solved by either Washington DC or New York, based on whether it is "capital" or "most recognizable city". There should be a way to encode the multi-dimensionality of the analogy into the coordinates of the map. So you would end up with
So in the case of the "getting stereotypical gifts for Mom and Dad", you might end up with
So, to go in to other direction, while the curried 1-argument maps (~actions) might be abelian and invertible, the currying process requires matching on the sense in which the other arguments are used, which we haven't done in the case of the degenerate example you cited (= too many maps end up on Flowers) e.g.
Flowers : Flowers : Breakfast-in-Bed : Mom :: Socks : Tie : Mug : Dad
is plausible but
Flowers : Flowers : Mom :: Socks : Car : Dad
is more clearly wrong.
It seems like Lawvere metric spaces might capture some of this? That allows the coordinates of normal vector spaces (which is what word2vec
is doing in John's examples), while also allowing better level-shifting and more general values?
@Evan Washington Interesting examples! But I think in the undirected graph c-a-b-d
the pairs (a,b) and (c,d) do look quite analogous to me. They are both pairs switched by an automorphism of the structure (so c,d in a sense share all properties; in particular their degrees).
Then again, I'm predisposed to look for automorphisms. But if somebody asks me to complete the analogy a :: b ::: c :: ?
with the graph above, I'd think the most compelling solution I could find would still be d
by some margin, no matter what I considered.
I'd love to find a scenario where presented with a :: b ::: c :: ?
most people would immediately want to answer X, but when presented with a :: c ::: b :: ?
most people would immediately come up with some different Y instead.
Eric M Downes said:
It seems like you're pointing out that unlike in a simple binary operator like a group, the category may have more than one object, and so moving along morphisms in certain directions requires one to specify more information for unique inverses.
This was not exactly where I was going, but I do see the connection, mostly if you see words as being able to change the context of a conversation while being said/used (which is very possible)
Speaking of embeddings (and transformers), I was wondering how chatgpt would interpret the two relationships, and I was quite surprised by the second answer:
Screenshot_20240627_161236_Chrome.jpg
Screenshot_20240627_161246_Chrome.jpg
Love that ChatGPT has learned to parrot equivocation boilerplate. The second answer seems to imply that people and flowers are, to an AI, essentially the same as tools. :)
I was also talking to chatGPT about this topic. It seemed to have difficulty on this and related topics! For example, we have this bold claim:
chatGPT
Moments like these help me remember that chatGPT4o still has significant limitations.
Well, this is true if you push the block really far at each step...
Re heaps themselves: I just remembered attending a talk by @Joe Razavi around 2018, where he presented something he tentatively called sherds at the time, which were categorial gadgets very closely related to torsors. In hindsight they might well have been categorified heaps. I don't think he published about them, but I hope he'll have some notes or slides to share.
Going back a long ways: thanks, @Zoltan A. Kocsis (Z.A.K.), for explaining some of the huge amount of work that has been done on these issues. The sheer mass of it makes me less inclined to think about this more... I like 'virgin territory': a topic that either people haven't studied yet, or, almost as good, a topic where I don't know what they've said. :upside_down:
(I'm going to revive my much-maligned upside-down smiley because only that emoticon accurately conveys my emotion here.)
For purely mathematical - not practical - reasons, I like the idea of studying analogies in a heap coming from a nonabelian Lie group, like . We can give a nonabelian Lie group a metric, and it will always look "almost abelian" on small scales. This is why people study Lie algebras. In my simple example: if you look at a tiny patch of , it looks almost flat, and you think you're in the vector space . Deviations from the commutative law become noticeable only at larger scales.
In fact, I believe deviations from the commutative law stated in heap form:
will be of order if are all of distance from each other.
So, if we think of a nonabelian Lie group as a space of "concepts" (whatever those are), analogies involving 4 concepts that are very close to each other will seem to commute, because any deviations from commutativity will be unnoticeable in the haze of error bars.
However, some real-world analogies seem to involve 4 concepts that form roughly a long thin parallelogram: we have two pairs of concepts, and the two concepts in each pair are close.
For example, "mom" and "dad" are close, and "electric drill" and "sewing machine" are close... but the people are not close to the machines.
So the analogy
"mom is to dad as sewing machine is to electrical drill"
is different in flavor than
"mom is to sewing machine as to dad is to electrical drill".
In the former one,
if we think of this as happening in a group with a metric, we're equating two group elements that are small, since "mom" is close to "dad" and "sewing machine" is close to "electric drill".
In the latter,
we're equating two group elements that are large.
The latter could be more error-prone... actually regardless of whether the commutative law holds or not. And chatGPT seems to dislike the latter kind of analogy: it would say it's bad because "mom" and "sewing machine" are not in the same "category".
Well, I thought my point was going to be about detecting deviations from the commutative law in analogies where some of the four items are quite far from each other. But I seem to be saying something else:
A lot of analogies have four items that come in two pairs
x is to x' as y is to y'
with x close to x' and y close to y'. And then applying the commutative law, we get an analogy
x is to y as x' is to y'
that is "disfavored" for some reason, because x is not close to y, and x' is not close to y'.
And by the way, it seems that analogies are even more disfavored if 3 of the items are all far from each other. Maybe someone here already said that. But something like
"mom is to sewing machine as car is to... "
seems very disfavored.
(I don't actually think a metric captures the full problem here.)
So, just to summarize what I hear you saying
a : b :: c : d
would be in heap-language , so a measure of this is to ask how "far" is from . This uses similarity as "have the same ratio"
analogies are geometric thinking in a euclidean space, in which similar things are close, which is why they appear abelian. This uses similarity as "have a small euclidean distance in low dimension"
by taking the commutator of the representation of we move to a lie algebra in which the small deviations from abelian-ness are highlighted.
So we might hope to derive a locally-euclidean manifold from these word2vec
style distances, if such a dataset were at hand. (A quick google search didn't turn up anything, which is hopeful, but maybe Rémy knows if this has already been done.)
John Baez said:
For purely mathematical - not practical - reasons, I like the idea of studying analogies in a heap coming from a nonabelian Lie group, like . [..] In fact, I believe deviations from the commutative law stated in heap form:
will be of order if are all of distance from each other.
A nice observation; I believe one possible rigorous counterpart to the intuitive notion that analogies involving fewer differences and smaller differences are easier to solve.
An example of fewer differences from the ARC challenge is shown below. I expect a random ARC solver would have a lower success rate on the analogy completion involving the 1st and 3rd rows than the one involving the 1st and 2nd. (Keep in mind that these are both very easy instances compared to most actual ARC challenge problems.)
(I don't actually think a metric captures the full problem here.)
There should be various results around this idea, some algebraic, some geometric.
For example, there are concepts which differ only along a few axes in a linear concept space (e.g. one could say that "king" and "queen" mostly, but not entirely, differ along a gender axis of some sort), and concepts which differ along many (e.g. "blue" and "multidisciplinary" presumably differ along many).
If I randomly pick two affine transformations, and ,which both have a large number of invariant subspaces (thus fix many axes/properties), I would guess are much more likely to commute than two transformations, deliberately chosen to leave few subspaces invariant. I bet someone has counted this over finite fields, where commuting probability has a natural measure.
Since many analogies involve a small number of things, this random math-fact might be helpful.
You can scale the matrices of the standard Weyl-Cartan basis of to get a lie bracket (well technically just an anti-commutative magma with involution) on a finite set. So you could plug in three differences of concepts for and play around with vectors for them.
The scaled matrices are ( scaled by 1/2)
which gives you a lie bracket on a finite set ()
emd-tabular.png
The tabulation of the Lie bracket above doesn't render for me, here's a latex rendering. (okay, it works after your edit)
Eric M Downes said:
- So we might hope to derive a locally-euclidean manifold from these
word2vec
style distances, if such a dataset were at hand. (A quick google search didn't turn up anything, which is hopeful, but maybe Rémy knows if this has already been done.)
Key words such as "contextual embeddings" and "analogies" led me to these papers(+datasets)/discussions, which could potentially enrich the discussion:
Also, the second paper, which proposes a dataset, also proposes an optimization function to approximate the underlying [[heap]]:
Screenshot-2024-06-28-at-7.49.28-AM.png
Eric M Downes said:
So, just to summarize what I hear you saying
a : b :: c : d
would be in heap-language , so a measure of this is to ask how "far" is from . This uses similarity as "have the same ratio"
Right. For those who haven't been paying careful attention so far: is a [[heap]], and from any heap we can functorially construct a group called its structure group, which acts on the left on , and there's a division operation from the heap to this group, which Eric is using here.
(I mention this because traditional heap theorists focus on the ternary operation that any heap has, while we're breaking it down into a bunch of different operations: the division operation , the group action or "multiplication" , and the group operations on . It's an equivalent framework but I think it makes it easier to talk about things.)
David Egolf said:
I was also talking to chatGPT about this topic. It seemed to have difficulty on this and related topics! For example, we have this bold claim:
chatgpt.png [moving an object one foot north and then one foot east is not the same as moving it one foot east and then one foot north]
Moments like these help me remember that chatGPT4o still has significant limitations.
I think ChatGPT's remark is correct: for an extreme case, imagine a body that is two feet south of the north pole.... or if you want a very extreme case, one foot south of the north pole!
But I don't know if it's being super-smart, or super-dumb!
(Following on John's comment re heaps.
For those familiar with group [[action]]'s, this diagram describes a heap in terms of the (left) group-action and inverse operation ; I find the presentation of the division/inverse particularly Yonedafull. And no, nothing to do with compsci data structures.)
Zoltan A. Kocsis (Z.A.K.) said:
Re heaps themselves: I just remembered attending a talk by Joe Razavi around 2018, where he presented something he tentatively called sherds at the time, which were categorial gadgets very closely related to torsors. In hindsight they might well have been categorified heaps. I don't think he published about them, but I hope he'll have some notes or slides to share.
Do you mean a [[vertical categorification]] or a [[horizontal categorification]]?
The horizontal categorification would arise from trying to fill in this analogy
"groups are to heaps as groups are to ... what?"
The answer is obviously "heapoids". :upside_down: But what is a heapoid?
(Yes, if we had a well-defined heap of mathematical concepts we could use it to compute the definition of "heapoid" by solving the equation "group is to heap as groupid is to... what?" But alas we don't!)
There is a claim, added to the nLab page [[heap]] by @Toby Bartels, that heapoids exist. But when I try to track down a definition, all I can find is this MathOverflow comment by Mike Shulman, which proposes a definition in response to someone complaining about the nLab page.
@John Baez (Typo: "groups are to heaps as groups are to ... what?" one of those 'groups's should say 'groupoids')
John Baez said:
There is a claim, added to the nLab page [[heap]] by Toby Bartels, that heapoids exist.
That claim was added by Zoran Škoda in revision 2; I corrected a typo in the spelling in revision 3.
Aha! So we have to track down Zoran and get him to hand over the definition of heapoid! Sorry to blame this on you. :upside_down:
John Baez said:
And by the way, it seems that analogies are even more disfavored if 3 of the items are all far from each other. Maybe someone here already said that. But something like
"mom is to sewing machine as car is to... "
seems very disfavored.
I was intrigued by the expectation that analogies involving 3 far apart items would be disfavored. So I loaded up a word2vec analogy solver (take the embedding, do vector arithmetic, then search for a good word using cosine similarity) with Google's datasets. Although Google's analogy dataset is not great (looks like it's full of boring grammatical analogies and geographical facts), the analogies it contains are at least compelling to humans.
After filtering out analogies that the solver couldn't solve, I analyzed the remaining 17,673. I defined the width of an analogy completion problem a :: b ::: c :: ?
as the Euclidean norm between b
and a
, and the height as the norm between d
and c
. I measure lopsidedness using the aspect ratio minus one (so squares have zero lopsidedness, and infinitesimally thin rectangles have infinite lopsidedness).
Assuming longer analogies with three far-apart items would be harder to form compellingly, I expected a positive correlation between total length and lopsidedness. However, I actually got a weak negative correlation (-0.06); long analogies are not necessarily more lopsided. Tentatively this suggests that the three items being far apart does not, by itself, make analogies less compelling.
Scatterplot of length vs. lopsidedness (385 randomly sampled analogies):
abp-length-vs-lopsidedness.png
Example long (95th percentile length) analogy problems:
PROBLEM length lopsidedness
Vietnam :: dong ::: Macedonia :: ? 7.866435 0.008287191
Mexico :: peso ::: Ukraine :: ? 8.244011 0.16631973
debug :: debugging ::: predict :: ? 8.030302 0.98494625
Example short (5th percentile length) analogy problems:
PROBLEM length lopsidedness
horse :: horses ::: donkey :: ? 4.570353 0.718415
Ukraine :: Ukrainian ::: Russia :: ? 4.324498 0.10069001
nephew :: niece ::: son :: ? 3.419754 0.33593798
Nice work and thanks for given us this insight @Zoltan A. Kocsis (Z.A.K.)! I have a couple of questions regarding your computation, and maybe you can correct me. For simplicity, I will take and . Given these notations did you do the following?
If this is what you did, then we would have and hence . This means that your graph will be affected by the norms of the vectors that you sample. If you sample random data points, maybe this may make your graph more random? (please correct me if you have more insight on this).
Somehow I feel like it could be interesting to have and separated on one side and and mixed together on the other side. Do you think it could be interesting to plot one of these:
@Rémy Tuyéras Thanks for your suggestions: 1 and 2 are precisely what I've done. Alas, I haven't had time to think about your suggestions yet, much less test them on the actual data (things are busy), but I do intend to give it a go some time next week!
Following on the topic of linguistics and natural language processing, I present this as an open challenge if anyone wants to try to submit code to achieve the following. I also spend some time trying to understand exponents.