Are analogies typically abelian? · theory: mathematics

You can draw the four differences here as the four sides of a parallelogram. The two equations are equivalent because a vector space is an abelian group. In a nonabelian group

So my main question is: are analogies generally 'abelian', or can you think of examples where

There's a general notion of [[heap]], which sheds some light on this. A heap has a ternary operation

t

where

t(a,b,c)

answers the question "

a

is to

b

as what is to

c

?" Any group gives a heap where

and any heap is isomorphic to one coming from a group. But abelian groups give heaps satisfying the 'abelian' law

Zoltan A. Kocsis (Z.A.K.) (Jun 27 2024 at 18:21):

Yes, it is the consensus among cognitive scientists that analogies are generally commutative (rotatable).

"abc" :: "abd"
"ijk" :: ???

Such analogy problems have many spurious solutions (see Melanie Mitchell's introduction). It's famous folklore that in a conversation between D. Hofstadter and R. Feynman, the latter insisted that the right solution is "abd" by Ockham-style reasoning. The counterarguments to Feynman's solution already involved some commutative square considerations (Feynman's reasoning would yield a different solution to the symmetric`abc :: ijk ::: abd :: ?). Fluid Concepts and Creative Analogies, the first book sold on Amazon, deals with this in depth: the "dissatisfaction" value of Chapter 5 is essentially a measure of failure to commute when composing two relations.

For solving such analogies, some traditional AI systems try to synthesize pairs of functions, f,g: String -> String, where f("abc") = "abd" and g("abc") = "ijk" hold. A solution to the analogy is found when both (g o f)("abc") and (f o g)("abc") produce the same result. And a commutative square is a commutative square no matter which way you look at it.

This is not limited to toy domains like letter-string analogies, though. In the 2010s, Excel's generative autofill (FlashFill) was implemented using the very same principle (using each rows as part of an "analogy" in an action synthesis problem). Some current close-to-SoTA solutions to Chollet's Abstraction and Reasoning Challenge (ARC) also use a variant of the same idea: however, this application requires a very very good program synthesizer. A good LLM can mostly do the job, but you can't make the ARC leaderboard if you call out to an external LLM.

Moreover, this principle explains a part of the early success that word2vec-style embeddings and vector differences had in forming natural language analogies (king/queen//man/woman etc.), and how they relate to the earlier search-based GOFAI approaches: they solve an analogya :: b ::: c :: ? by "synthesizing" the very simple programs

f = x \mapsto x + (b - a)

and

g = x \mapsto x + (c - a)

, which commute by default. In contrast, it's really quite difficult to find commuting programs using unconstrained genetic programming or other common kinds of search.

Zoltan A. Kocsis (Z.A.K.) (Jun 27 2024 at 18:33):

(Cogsci literature on such topics is kinda hard to read, but I found it worth engaging with: they did think about things a lot, and the existing literature contains most ideas one would naively have in several cryptomorphic forms)

Evan Washington (Jun 27 2024 at 18:37):

Why should one think that the alternative solutions ("ijl", say) are spurious rather than thinking that there are just several good answers?

Zoltan A. Kocsis (Z.A.K.) (Jun 27 2024 at 18:47):

@Evan Washington There _can_ be several good answers. The domain experts tend to agree that "ijd" is not a particularly good answer for this particular problem compared to "ijl", and they have written books of arguments about why they think that and what good solutions are (including a fair bit of empirical research about which letter-string analogies people find compelling). (More importantly, I misquoted Feynman's solution, which was actually that "ijk" maps to "abd" too (fixed now in edit), which is much more clearly spurious than "ijd".)

IIRC it was an interesting exercise to write down two short commuting Python programs which realize ("abc"->"abd", "abc"->"ijk") with their composites sending "abc"->"ijd", but I did it over 10 years ago so I can't vouch for that. The ones that realize "ijk" -> "ijl" are much easier.

Zoltan A. Kocsis (Z.A.K.) (Jun 27 2024 at 18:50):

The analogy below, however, is definitely nice to contemplate (especially if one has seen similar pushout/pullback squares).
nonlinguistic.png

David Egolf (Jun 27 2024 at 18:57):

Interesting stuff! We can write the first analogy as

f(x)=x'

and

f(y)=y'

. That is, there is some process which produces both

x'

from

x

and

y'

from

y

. The second analogy can be written as

g(x)=y

and

g(x')=y'

. I considered the special case where we can think of the first process

f

as "evaluating a property

p

". So, I write the first analogy as

x.p=x'

and

y.p=y'

. The idea is that the

p

-th property of

x

x'

and the

p

-th property of

y

y'

Using this notation, then

g(x')=y'

can be written as

g(x.p)=y.p

. Since

g(x)=y

, this can be rewritten as

g(x.p)=g(x).p

. That is, the process

g

"preserves the property

p

We can always achieve this situation by defining

g(x.p)

to be

g(x).p

, after setting

g(x)=y

. And this seems to give reasonable analogies in some examples. For example, consider the analogy "tree is to leaf as body is to hand". We have tree.p=leaf and body.p=hand. Then we set g(tree)=body, and to finish our second analogy we need also to have g(tree.p)=body.p. We can achieve this by setting g(tree.p)=g(leaf)=hand. And so our second analogy ends up being "tree is to body as leaf is to hand", which seems reasonable.

In general, it seems plausible to expect that if x is analogous to y, then the pth part of x will be intuitively analogous to the pth part of y. I think this relates to the fact that we can often "swap kinds of things" and "swap between the whole and a specific part of things" in either order and get the same answer.

Rémy Tuyéras (Jun 27 2024 at 19:05):

This is quite an interesting question. Given that the answer may also depend on data, context, and culture, there is a lot of room for debate. Therefore, I am not sure how my examples below will be received, but I am trying to throw out some ideas.

\mathsf{Car} = \mathsf{Dad} + \mathsf{BestPresent}

\mathsf{Flowers} = \mathsf{Mom} + \mathsf{BestPresent}

\mathsf{Socks} = \mathsf{Dad} + \mathsf{MostOftenBoughtPresent}

\mathsf{Flowers} = \mathsf{Mom} + \mathsf{MostOftenBoughtPresent}

Now, if the structure above were Abelian, we would have

\mathsf{Socks} = \mathsf{Car}

. The problem is that we cannot represent

\mathsf{BestPresent}

as the expression:

If this is of any interest to anyone, I have studied these questions in commutative idempotent monoids (i.e. where you have

\mathsf{Dad} + \mathsf{Dad} = \mathsf{Dad}

) in this paper. There, I provide a generalized linear algebra for commutative idempotent monoids to solve these kinds of questions (by defining an imaginary subtraction operation where

\mathsf{Dad}-\mathsf{Dad} \neq 0

). My goal was to untangle complex relationships between haplotypes and genes in populations over generations. For these problems, I was handling the contextuality of these relations by using some sheaf-like functors on local parts of DNA.

David Egolf (Jun 27 2024 at 19:26):

[Thanks for linking that paper @Rémy Tuyéras! I still have a very long ways to go, but one of my main long term goals is to do math that studies the process of medical imaging reconstruction. I anticipate that this will involve thinking deeply about what it means to "untangle complex relationships". In my case this would involve studying the relationships between observations obtained in different ways (e.g. as we perturb the thing to be imaged differently, or as we take observations on different sensors or at different times). I assume that this medical imaging direction is quite different from what you describe in your paper. Nonetheless, I am still excited to someday read your paper to begin understanding some of the modern mathematical techniques and ideas that can be used to study complex relationships!]

Evan Washington (Jun 27 2024 at 19:31):

here's an example (perhaps) of an analogy breaking the abelian law. in the (undirected) path graph

c R a R b R d

, it seems true that

a : c

b : d

(there is an edge between each). but it seems false that

a : b

c : d

(there is an edge between

a

and

b

but not

c

and

d

and here's a natural language example (which you may find less plausible). Father is to father's father (paternal grandfather) as mother is to mother's mother (maternal grandmother), but it seems like it's not the case that father is to mother as paternal grandfather is to maternal grandmother. (It seems better, to me, to say that father is to mother as paternal grandfather is to paternal grandmother.)

Eric M Downes (Jun 27 2024 at 19:36):

That's a cool paper @Rémy Tuyéras ! I look forward to reading it more. Hopefully you won't mind some possibly superficial reflections.

It seems like you're pointing out that unlike in a simple binary operator like a group, the category may have more than one object, and so moving along morphisms in certain directions requires one to specify more information for unique inverses.

So,Paris : France :: ____ :: USA could be solved by either Washington DC or New York, based on whether it is "capital" or "most recognizable city". There should be a way to encode the multi-dimensionality of the analogy into the coordinates of the map. So you would end up with

f(France, USA)=\big((Paris,Paris),(DC, NY)\big)

So in the case of the "getting stereotypical gifts for Mom and Dad", you might end up with

f(Mom, Dad) =\big((Flowers, Flowers, Breakfast, Spa Day), (Socks, Tie, Mug, Car)\big)

So, to go in to other direction, while the curried 1-argument maps (~actions) might be abelian and invertible, the currying process requires matching on the sense in which the other arguments are used, which we haven't done in the case of the degenerate example you cited (= too many maps end up on Flowers) e.g.

Flowers : Flowers : Breakfast-in-Bed : Mom :: Socks : Tie : Mug : Dad is plausible but
Flowers : Flowers : Mom :: Socks : Car : Dad is more clearly wrong.

It seems like Lawvere metric spaces might capture some of this? That allows the coordinates of normal vector spaces (which is what word2vec is doing in John's examples), while also allowing better level-shifting and more general values?

Zoltan A. Kocsis (Z.A.K.) (Jun 27 2024 at 19:59):

@Evan Washington Interesting examples! But I think in the undirected graph c-a-b-d
the pairs (a,b) and (c,d) do look quite analogous to me. They are both pairs switched by an automorphism of the structure (so c,d in a sense share all properties; in particular their degrees).

Then again, I'm predisposed to look for automorphisms. But if somebody asks me to complete the analogy a :: b ::: c :: ? with the graph above, I'd think the most compelling solution I could find would still be d by some margin, no matter what I considered.

I'd love to find a scenario where presented with a :: b ::: c :: ?most people would immediately want to answer X, but when presented with a :: c ::: b :: ? most people would immediately come up with some different Y instead.

Rémy Tuyéras (Jun 27 2024 at 20:25):

This was not exactly where I was going, but I do see the connection, mostly if you see words as being able to change the context of a conversation while being said/used (which is very possible)

Speaking of embeddings (and transformers), I was wondering how chatgpt would interpret the two relationships, and I was quite surprised by the second answer:

Eric M Downes (Jun 27 2024 at 20:37):

Love that ChatGPT has learned to parrot equivocation boilerplate. The second answer seems to imply that people and flowers are, to an AI, essentially the same as tools. :)

David Egolf (Jun 27 2024 at 20:45):

I was also talking to chatGPT about this topic. It seemed to have difficulty on this and related topics! For example, we have this bold claim:
chatGPT

Moments like these help me remember that chatGPT4o still has significant limitations.

Kevin Carlson (Jun 27 2024 at 20:50):

Zoltan A. Kocsis (Z.A.K.) (Jun 28 2024 at 02:30):

Re heaps themselves: I just remembered attending a talk by @Joe Razavi around 2018, where he presented something he tentatively called sherds at the time, which were categorial gadgets very closely related to torsors. In hindsight they might well have been categorified heaps. I don't think he published about them, but I hope he'll have some notes or slides to share.

John Baez (Jun 28 2024 at 08:38):

Going back a long ways: thanks, @Zoltan A. Kocsis (Z.A.K.), for explaining some of the huge amount of work that has been done on these issues. The sheer mass of it makes me less inclined to think about this more... I like 'virgin territory': a topic that either people haven't studied yet, or, almost as good, a topic where I don't know what they've said. :upside_down:

(I'm going to revive my much-maligned upside-down smiley because only that emoticon accurately conveys my emotion here.)

John Baez (Jun 28 2024 at 08:52):

For purely mathematical - not practical - reasons, I like the idea of studying analogies in a heap coming from a nonabelian Lie group, like

\mathrm{SU}(2) \cong S^3

. We can give a nonabelian Lie group a metric, and it will always look "almost abelian" on small scales. This is why people study Lie algebras. In my simple example: if you look at a tiny patch of

S^3

, it looks almost flat, and you think you're in the vector space

\mathbb{R}^3

. Deviations from the commutative law become noticeable only at larger scales.

will be of order

\epsilon^3

a,b,c,d

are all of distance

O(\epsilon)

from each other.

John Baez (Jun 28 2024 at 08:55):

So, if we think of a nonabelian Lie group as a space of "concepts" (whatever those are), analogies involving 4 concepts that are very close to each other will seem to commute, because any deviations from commutativity will be unnoticeable in the haze of error bars.

John Baez (Jun 28 2024 at 08:57):

However, some real-world analogies seem to involve 4 concepts that form roughly a long thin parallelogram: we have two pairs of concepts, and the two concepts in each pair are close.

John Baez (Jun 28 2024 at 08:58):

For example, "mom" and "dad" are close, and "electric drill" and "sewing machine" are close... but the people are not close to the machines.

John Baez (Jun 28 2024 at 09:00):

if we think of this as happening in a group with a metric, we're equating two group elements that are small, since "mom" is close to "dad" and "sewing machine" is close to "electric drill".

John Baez (Jun 28 2024 at 09:01):

The latter could be more error-prone... actually regardless of whether the commutative law holds or not. And chatGPT seems to dislike the latter kind of analogy: it would say it's bad because "mom" and "sewing machine" are not in the same "category".

John Baez (Jun 28 2024 at 09:13):

Well, I thought my point was going to be about detecting deviations from the commutative law in analogies where some of the four items are quite far from each other. But I seem to be saying something else:

with x close to x' and y close to y'. And then applying the commutative law, we get an analogy

that is "disfavored" for some reason, because x is not close to y, and x' is not close to y'.

John Baez (Jun 28 2024 at 09:26):

And by the way, it seems that analogies are even more disfavored if 3 of the items are all far from each other. Maybe someone here already said that. But something like

John Baez (Jun 28 2024 at 09:27):

Eric M Downes (Jun 28 2024 at 10:32):

Zoltan A. Kocsis (Z.A.K.) (Jun 28 2024 at 10:35):

A nice observation; I believe one possible rigorous counterpart to the intuitive notion that analogies involving fewer differences and smaller differences are easier to solve.

An example of fewer differences from the ARC challenge is shown below. I expect a random ARC solver would have a lower success rate on the analogy completion involving the 1st and 3rd rows than the one involving the 1st and 2nd. (Keep in mind that these are both very easy instances compared to most actual ARC challenge problems.)

There should be various results around this idea, some algebraic, some geometric.

For example, there are concepts which differ only along a few axes in a linear concept space (e.g. one could say that "king" and "queen" mostly, but not entirely, differ along a gender axis of some sort), and concepts which differ along many (e.g. "blue" and "multidisciplinary" presumably differ along many).

If I randomly pick two affine transformations,

A

and

B

,which both have a large number of invariant subspaces (thus fix many axes/properties), I would guess

A,B

are much more likely to commute than two transformations,

C,D

deliberately chosen to leave few subspaces invariant. I bet someone has counted this over finite fields, where commuting probability has a natural measure.

Eric M Downes (Jun 28 2024 at 10:43):

Since many analogies involve a small number of things, this random math-fact might be helpful.

You can scale the matrices of the standard Weyl-Cartan basis of

\frak sl_2(\mathbb{C})

to get a lie bracket (well technically just an anti-commutative magma with involution) on a finite set. So you could plug in three differences of concepts for

h,e,f

and play around with vectors for them.

The scaled matrices are (

e,h

scaled by 1/2)

h=\begin{pmatrix}1/2 & 0\\0 & -1/2\end{pmatrix}~~~~e=\begin{pmatrix}0& 1/2\\0 & 0\end{pmatrix}~~~~~f=\begin{pmatrix}0 & 0\\1 & 0\end{pmatrix}

which gives you a lie bracket on a finite set (

[x,y]=xy-yx

)

\begin{array}{c|ccccccc} [~,\,]&h&e&f&-h&-e&-f&0\\\hline h&0&e&-f&0&-e&f&0\\ e&-e&0&h&e&0&-h&0\\ f&f&-h&0&-f&h&0&0\\ -h&0&-e&f&0&e&-f&0\\ -e&e&0&-h&-e&0&h&0\\ -f&-f&h&0&f&-h&0&0\\ 0&0&0&0&0&0&0&0 \end{array}

Zoltan A. Kocsis (Z.A.K.) (Jun 28 2024 at 10:46):

emd-tabular.png
The tabulation of the Lie bracket above doesn't render for me, here's a latex rendering. (okay, it works after your edit)

Rémy Tuyéras (Jun 28 2024 at 11:41):

Key words such as "contextual embeddings" and "analogies" led me to these papers(+datasets)/discussions, which could potentially enrich the discussion:

Rémy Tuyéras (Jun 28 2024 at 11:50):

John Baez (Jun 28 2024 at 12:28):

Right. For those who haven't been paying careful attention so far:

X

is a [[heap]], and from any heap we can functorially construct a group

G

called its structure group, which acts on the left on

X

, and there's a division operation from the heap to this group, which Eric is using here.

(I mention this because traditional heap theorists focus on the ternary operation that any heap has, while we're breaking it down into a bunch of different operations: the division operation

/: X \times X \to G

, the group action or "multiplication"

\cdot : G \times X \to X

, and the group operations on

G

. It's an equivalent framework but I think it makes it easier to talk about things.)

John Baez (Jun 28 2024 at 13:43):

I think ChatGPT's remark is correct: for an extreme case, imagine a body that is two feet south of the north pole.... or if you want a very extreme case, one foot south of the north pole!

Eric M Downes (Jun 28 2024 at 13:51):

For those familiar with group [[action]]'s, this diagram describes a heap

(X,~t:X^3\to X,~ v:X^2\to G)

in terms of the (left) group-action

\alpha:GX\to X

and inverse operation

(-)^{-1}:G\to G

; I find the presentation of the division/inverse particularly Yonedafull. And no, nothing to do with compsci data structures.)

John Baez (Jun 28 2024 at 14:29):

(Yes, if we had a well-defined heap of mathematical concepts we could use it to compute the definition of "heapoid" by solving the equation "group is to heap as groupid is to... what?" But alas we don't!)

There is a claim, added to the nLab page [[heap]] by @Toby Bartels, that heapoids exist. But when I try to track down a definition, all I can find is this MathOverflow comment by Mike Shulman, which proposes a definition in response to someone complaining about the nLab page.

Oscar Cunningham (Jun 28 2024 at 14:38):

@John Baez (Typo: "groups are to heaps as groups are to ... what?" one of those 'groups's should say 'groupoids')

Toby Bartels (Jun 28 2024 at 14:44):

That claim was added by Zoran Škoda in revision 2; I corrected a typo in the spelling in revision 3.

John Baez (Jun 28 2024 at 15:13):

Aha! So we have to track down Zoran and get him to hand over the definition of heapoid! Sorry to blame this on you. :upside_down:

Zoltan A. Kocsis (Z.A.K.) (Jun 29 2024 at 19:04):

I was intrigued by the expectation that analogies involving 3 far apart items would be disfavored. So I loaded up a word2vec analogy solver (take the embedding, do vector arithmetic, then search for a good word using cosine similarity) with Google's datasets. Although Google's analogy dataset is not great (looks like it's full of boring grammatical analogies and geographical facts), the analogies it contains are at least compelling to humans.

After filtering out analogies that the solver couldn't solve, I analyzed the remaining 17,673. I defined the width of an analogy completion problem a :: b ::: c :: ? as the Euclidean norm between b and a, and the height as the norm between d and c. I measure lopsidedness using the aspect ratio minus one (so squares have zero lopsidedness, and infinitesimally thin rectangles have infinite lopsidedness).

Assuming longer analogies with three far-apart items would be harder to form compellingly, I expected a positive correlation between total length and lopsidedness. However, I actually got a weak negative correlation (-0.06); long analogies are not necessarily more lopsided. Tentatively this suggests that the three items being far apart does not, by itself, make analogies less compelling.

PROBLEM                                 length     lopsidedness
Vietnam :: dong  :::  Macedonia :: ?    7.866435   0.008287191
Mexico :: peso  :::  Ukraine :: ?       8.244011   0.16631973
debug :: debugging  :::  predict :: ?   8.030302   0.98494625

PROBLEM                                      length     lopsidedness
horse :: horses  :::  donkey :: ?            4.570353   0.718415
Ukraine :: Ukrainian  :::  Russia :: ?       4.324498   0.10069001
nephew :: niece  :::  son :: ?               3.419754   0.33593798

Rémy Tuyéras (Jun 29 2024 at 19:42):

Nice work and thanks for given us this insight @Zoltan A. Kocsis (Z.A.K.)! I have a couple of questions regarding your computation, and maybe you can correct me. For simplicity, I will take

x = a-b

and

y = c-d

. Given these notations did you do the following?

If this is what you did, then we would have

\|x\| = (\rho+1) \cdot \|y\|

and hence

\lambda = (\rho+2) \cdot \|y\|

. This means that your graph will be affected by the norms of the vectors

y

that you sample. If you sample random data points, maybe this may make your graph more random? (please correct me if you have more insight on this).

Somehow I feel like it could be interesting to have

x

and

y

separated on one side and

x

and

y

mixed together on the other side. Do you think it could be interesting to plot one of these:

Zoltan A. Kocsis (Z.A.K.) (Jul 04 2024 at 16:16):

@Rémy Tuyéras Thanks for your suggestions: 1 and 2 are precisely what I've done. Alas, I haven't had time to think about your suggestions yet, much less test them on the actual data (things are busy), but I do intend to give it a go some time next week!

Julius Hamilton (Jul 14 2024 at 17:29):

Following on the topic of linguistics and natural language processing, I present this as an open challenge if anyone wants to try to submit code to achieve the following. I also spend some time trying to understand exponents.

Stream: theory: mathematics

Topic: Are analogies typically abelian?

John Baez (Jun 27 2024 at 17:24):