Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: community: our work

Topic: Oscar Cunningham


view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:09):

I've discovered an interesting way to extend first-order logic to assign 'probabilities' to logical formulas. For example we get the following values:

P(w=xyzS:Set,  w,x,y,z:S)=15P(f injectiveS,T:Set,  f:ST,  f surjective)=1ee2P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S) = \frac{1}{5}\\ P(f\text{ injective}|S,T:\mathrm{Set},\;f:S\to T,\;f\text{ surjective}) = \frac{1}{e^{e-2}}

Has anything like this already been discovered?

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:16):

The idea is to treat conditional probabilities P(AΓ)P(A|\Gamma) as analogous to sequents ΓA\Gamma\vdash A. Irritatingly the conventions change the order of Γ\Gamma and AA. I also have a semantic notion which I'll write as P(AΓ)P(A\|\Gamma) by analogy to ΓA\Gamma\vDash A. One of the interesting things will be that these syntax and semantics agree with each other despite not initially seeming related.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:40):

The syntactic side of things uses the Principle of Indifference, which says that if we don't know anything to distinguish two possibilities, then we should assign to them the same probability. I interpret this formally by a rule that says that α\alpha-equivalent sequents have the same probability.

view this post on Zulip Peva Blanchard (Nov 01 2024 at 15:41):

This reminds of this paper (logical induction), but I never managed to find the time to read it thoroughly.

Also, Solomonoff tried to define "universal prior" in his theory of inductive inference

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:47):

Ah yes, those are definitely related. A logical inductor also assigns probabilities to logical statements, but my understanding is that the probabilities it assigns to undecidable statements can vary depending on arbitrary implementation details, whereas the method I'm developing gives specific values to them. Likewise the exact values given by Solomonoff induction depend on the implementation.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:51):

The other principle that the syntax uses is that introduction of a new symbol via a definition should not change the probabilities. Like if you're playing a game and wondering what card your opponent has, then deciding to call the red cards 'foo' and the black cards 'bar' doesn't change the probability they have the seven of clubs.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:51):

Already this allows you to make some deductions.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 16:03):

Let's do a simple example in propositional logic. We want to calculate the probability that some proposition AA is true, given that we know nothing else about it. So we want to find

P(AA:Prop).P(A|A:\mathrm{Prop}).

We can introduce a new proposition BB by defining it as ¬A¬A. Since definitions don't change probabilities we have

P(AA:Prop)=P(AA,B:Prop,  B¬A).P(A|A:\mathrm{Prop})=P(A|A,B:\mathrm{Prop},\; B\leftrightarrow ¬A).

Now since we have B¬AB\leftrightarrow ¬A we can rewrite AA as ¬B¬B. Likewise we can rewrite B¬AB\leftrightarrow ¬A as the logically equivalent A¬BA\leftrightarrow ¬B.

P(AA:Prop)=P(¬BA,B:Prop,  A¬B).P(A|A:\mathrm{Prop})=P(¬B|A,B:\mathrm{Prop},\; A\leftrightarrow ¬B).

Now the right hand side has the form of a definition of AA, which we may remove:

P(AA:Prop)=P(¬BB:Prop).P(A|A:\mathrm{Prop})=P(¬B|B:\mathrm{Prop}).

Finally by α\alpha-equivalence we have

P(AA:Prop)=P(¬AA:Prop).P(A|A:\mathrm{Prop})=P(¬A|A:\mathrm{Prop}).

Separately we have the law of probability

P(AA:Prop)+P(¬AA:Prop)=1.P(A|A:\mathrm{Prop})+P(¬A|A:\mathrm{Prop})=1.

Substituting one into the other yields

P(AA:Prop)+P(AA:Prop)=1P(A|A:\mathrm{Prop})+P(A|A:\mathrm{Prop})=1

from which it follows that P(AA:Prop)=1/2P(A|A:\mathrm{Prop}) = 1/2.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 16:13):

(I hope I've left people wondering how stupid tricks like the above can lead to a proof that the probability that a surjection is also injective is 1/ee21/e^{e-2}. I'll write more later.)

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 21:42):

So having seen that P(AA:Prop)P(A|A:\mathrm{Prop}) is just 1/2, you can probably guess what the semantics are for propositional logic. We just assign a fair coin flip to each atomic proposition and then take the usual probability. So for example P(AA,B:Prop,  AB)=2/3P(A\|A,B:\mathrm{Prop},\;A\vee B)=2/3, since out of the four possible outcomes for two coinflips, three have ABA\vee B, and only two of those have AA.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 21:44):

Using similar arguments as above we can show that the syntactic reasoning gives the same answers, so we have an analogue of the completeness theorem for propositional logic.

view this post on Zulip John Baez (Nov 01 2024 at 21:53):

Cool! When I see 1/ee21/e^{e-2} I think "groupoid cardinality".

The Machine Intelligence Research Institute (Yudkowsky's outfit) had a meeting on trying to assign probabilities to mathematical statements, and I blogged about it here and here. They did not arrive at specific formulas as you are doing. But if you publish something, it'd be worth citing some of their papers, and they might be interested in what you've done.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 22:01):

"groupoid cardinality" :smile: Good intuition!

Yeah, I grew up on LessWrong so I have a good idea of what they're interested in. I also remembered the Progic posts from the n-Category Cafe. It seems that Jon Williamson was also trying to extend logic to probabilities. My plan is to write some things up here as a sounding board and then recompile them more formally as something to link people to. I'm not in academia so I probably won't publish an actual paper.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 22:25):

Moving on to first-order logic, we can consider the example I gave in the first post.

P(w=xyzS:Set,  w,x,y,z:S)P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S)

We're being asked the probability that some variables are equal or not to each other. We'd like to use the Principle of Indifference like we did before. The Principle applies whenever our information is symmetric between two propositions. But we know that == is an equivalence relation whereas \neq is not, so we are not in fact indifferent between equality and its negation. We can't, for example, conclude that w=xw=x has the same probability as wxw\neq x.

The trick is to forget temporarily what we know about equality, and then remember it again. People sometimes work in first-order logic without equality. Then if they decide they actually do want equality then they can just add a new symbol '==' to their language, and add some axioms stating that it's an equivalence relation.

So if we write Pw.e.()P_{\mathrm{w.e.}}(-|-) for probabilities without equality, we can imagine adding a binary relation == with no particular properties. Then the Principle of Indifference does apply, and using a similar argument to what we did in propositional logic we would get that

Pw.e.(w=xS:Set,  w,x:S,  =:S×SProp)=12P_{\mathrm{w.e.}}(w=x|S:\mathrm{Set},\; w,x:S,\;=:S\times S\to\mathrm{Prop}) = \frac{1}{2}

and similarly if we have four variables w,x,y,z:Sw,x,y,z:S we would find that all 24(4+1)/22^{4(4+1)/2} ways that they could behave with respect to == were equally likely.

Then we add back the condition that == is an equivalence relation on SS. We find that of those 1024 relations, only 15 are actually equivalence relations. And of those 15, only 3 satisfy w=xyzw=x\wedge y\neq z. So we have returned to logic-with-equality, and we have

P(w=xyzS:Set,  w,x,y,z:S)=Pw.e.(w=xyzS:Set,  w,x,y,z:S,  =:S×SProp,  = an equivalence relation)=315=15P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S)\\ =P_{\mathrm{w.e.}}(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S,\;=:S\times S\to\mathrm{Prop},\;=\text{ an equivalence relation})\\ =\frac{3}{15} = \frac{1}{5}

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 23:35):

What about the semantic side?

P(w=xyzS:Set,  w,x,y,z:S)P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S)

How do we relate first-order logic to actual random variables?

We can imagine that if we actually knew the size of SS then things would be easier. Suppose SS has a finite cardinality ss, and that ww, xx, yy and zz are chosen from it uniformly at random. Then there's a 1/s1/s chance that w=xw=x and an (s1)/s(s-1)/s chance that yzy\neq z. So overall our chance of w=xyzw=x\wedge y\neq z would be (s1)/s2(s-1)/s^2.

If we had a probability distribution for ss we could then take a weighted average of (s1)/s2(s-1)/s^2 over these probabilities to get our overall answer

sP(s)s1s2\sum_sP(s)\frac{s-1}{s^2}

What probability distribution should we choose? To represent our lack of information about SS, we want something like the 'uniform distribution over all cardinalities'. But sadly there is no uniform distribution on N\mathbb{N}, never mind on all cardinalities.

The solution is to use the weighting from the groupoid cardinality. We let the probability of a set SS be inversely proportional to the size of its automorphism group. So in this case the weight of ss is 1/s!1/s!. I won't explain why groupoid cardinality is a good idea, because there are already articles about it by Terry Tao, Qiaochu Yuan and John Baez. What hope do I have of explaining it better than the holy trinity? Suffice to say that it gives a natural distribution on finite sets which is in a categorical sense 'uniform' while still converging. (And indeed we could apply it to infinite sets. But their automorphism groups are uncountably infinite, so we'll say they have probability 0.)

To get the weights 1/s!1/s! to behave as probabilities, we have to normalise them so that they sum to 11. Since s1/s!=e\sum_s 1/s! = e, we get that the probability of ss is 1es!\frac{1}{es!}.

There's one last thing we need to worry about. The type declaration w,x,y,z:Sw,x,y,z:S on the right hand side of our conditional probability. We are used to finding probabilities of propositions. But w,x,y,z:Sw,x,y,z:S is not a proposition, it is a different kind of judgement: a typing judgement. This hasn't caused us problems so far because it has stayed safely on the right hand side of the conditional, where we are merely conditioning on it rather than actually calculating it's probability. But I want to apply the definition of conditional probability, which brings something from the right to the left of the conditional.

P(AΓ,B)=P(ABΓ)P(BΓ)P(A|\Gamma,B) = \frac{P(A\wedge B|\Gamma)}{P(B|\Gamma)}

Taking a leaf out of the 'propositions as types' playbook, I'm going to say that P(w,x,y,z:SS:Set)P(w,x,y,z:S\|S:\mathrm{Set}) is the expected number of ways to pick ww, xx, yy and zz from SS, i.e. the weighted average of s4s^4. This can be greater than 11, which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions. Likewise P(w=xyz,  w,x,y,z:SS:Set)P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set}) will be the average number of ways to pick ww, xx, yy and zz such that w=xyzw=x\wedge y\neq z is satisfied, i.e. s2(s1)s^2(s-1).

Now we can do our calculation:

P(w=xyzS:Set,  w,x,y,z:S)P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S)

=P(w=xyz,  w,x,y,z:SS:Set)P(w,x,y,z:SS:Set)= \frac{P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set})}{P(w,x,y,z:S\|S:\mathrm{Set})}

=ss2(s1)es!ss4es!=\frac{\sum_s\frac{s^2(s-1)}{es!}}{\sum_s\frac{s^4}{es!}}

Finally we bring out our secret weapon: Dobiński's formula. This says that sskes!\sum_s \frac{s^k}{es!} is BkB_k. The Bell number BkB_k is defined as the number of ways to partition a kk-element set. So here we have

ss3s2es!ss4es!\frac{\sum_s\frac{s^3-s^2}{es!}}{\sum_s\frac{s^4}{es!}}

=B3B2B4=\frac{B_3-B_2}{B_4}

=5215=15=\frac{5-2}{15}=\frac{1}{5}

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 23:37):

Now you might think that I've taken a few dodgy steps. Either on the syntactic side, or the semantic side, or both. But isn't it interesting that the two completely different methods give exactly the same answer? The agreement between these two calculations is what convinced me that there was something significant to be found here.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 23:39):

(Tomorrow I'll write up assigning probabilities to some formulas with quantifiers in them.)

view this post on Zulip David Corfield (Nov 02 2024 at 07:47):

Oscar Cunningham said:

It seems that Jon Williamson was also trying to extend logic to probabilities.

Just to point out that Jon, an (ex)-colleague at Kent of many years, developed ideas on inductive logic (cf. his book).

view this post on Zulip John Baez (Nov 02 2024 at 17:17):

This is fascinating stuff, @Oscar Cunningham. If you can check that these assignments of probabilities are 'coherent' in the usual sense (e.g. the probably of exhaustive and mutually exclusive events sums to 1), that would be really great. I think that any systematic and elegant way of coherently assigning probabilities to mathematical propositions is interesting - I wouldn't worry too much about what it "really means", or more precisely I hope such worries don't stop anyone from developing such a theory!

Even 'unnormalized probabilities' are interesting:

This can be greater than 1, which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions.

In physics we often use 'unnormalized probabilities', i.e. nonnegative numbers that needn't sum to 1. For example if a system has some finite set of states and the energy of the ii th state is EiE_i, in thermal equilibrium at temperature TT the unnormalized probability of finding the system in the ii th state is

exp(Ei/kT) \exp(-E_i/k T)

where kk is Boltzmann's constant. Unnormalized probabilities add and multiply the way probabilities do - they add for exhaustive and mutually exclusive events, and multiply for independent events. We extract probabilities from them only at the very end of our calculations, by normalizing them.

I've talked a bit to @Tobias Fritz about this: like me he wants to take these unnormalized probabilities quite seriously, in part because it's governed by the math of convex cones, which is better behaved (or at least simpler) than the math of convex sets.

view this post on Zulip John Baez (Nov 02 2024 at 17:23):

Btw, I love Dobinski's formula and explained a proof here, where I used it to answer this puzzle:

Suppose raindrops are falling on your head, randomly and independently, at an average rate of one per minute. What’s the average of the cube of the number of raindrops that fall on your head in one minute?

view this post on Zulip Oscar Cunningham (Nov 03 2024 at 21:12):

I should have said that the strange greater-than-11 probabilities only appear when we take the probability of a type declaration. When we take the probability of a proposition it always comes out to between 00 and 11, even if type declarations were used along the way. Like above when we worked out P(w=xyzS:Set,  w,x,y,z:S)P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S) as

P(w=xyz,  w,x,y,z:SS:Set)P(w,x,y,z:SS:Set) \frac{P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set})}{P(w,x,y,z:S\|S:\mathrm{Set})}

the numerator and denominator were 33 and 1515, both greater than 11, but their quotient was a sane 1/51/5. If AA is some proposition then the probability P(AΓ)P(A\|\Gamma), calculated as P(A,Γ)/P(Γ)P(A,\Gamma)/P(\Gamma), is always less than 11 because the Γ\Gamma-structures satisfying AA form a subset of the set of all Γ\Gamma-structures. Because of this I don't think we ever have to do any normalization. We just have to get used to the fact that a probability conditional on a type judgement is calculated as the ratio of two expected values, rather than two mere probabilities.

Regarding a coherence proof, I hope that this will turn out to be trivial from the semantic point of view, since the values there represent actual probabilities. Then proving it from the syntactic point of view will be a matter of proving a completeness theorem saying that the syntactic and semantic probabilities are the same. The problem will be that from the semantic point of view there can be contexts where the expected values don't converge. Like

P(f(x)=xS:Set,  f:SS,  x:S)=sss1ss!sssss!P(f(x)=x\|S:\mathrm{Set},\;f:S\to S,\;x:S)=\frac{\sum_s\frac{s^{s-1}s}{s!}}{\sum_s\frac{s^{s}s}{s!}}

To prove a completeness theorem I will have to deal with these, or more likely find a way to say something useful despite not dealing with them. More on that later.

view this post on Zulip Oscar Cunningham (Nov 03 2024 at 22:02):

The other thing I should point out is that we can use the reciprocal of the number of automorphisms to work out the probabilities of any structure, not just sets. Suppose we have a structure GG on some set SS with size ss. Then any permutation of SS acts on GG to give us another structure on SS, isomorphic to GG. But if the permutation is actually an automorphism of GG then this structure will just be GG again. So the total number of structures isomorphic to GG is s!/Aut(G)s!/\mathrm{Aut}(G). This multiplies with the 1/s!1/s! factor we have on sets to give 1/Aut(G)1/\mathrm{Aut}(G).

For example, consider sets equipped with a predicate: Γ=(S:Set,  q:SProp)\Gamma=(S:\mathrm{Set},\; q:S\to \mathrm{Prop}). One way to weigh these would be to consider that a set of size ss has 2s2^s possible predicates on it, so the total weight of them over all sets would be s2ss!=e2\sum_s\frac{2^s}{s!}=e^2. The other way to weigh them would be to say that up to isomorphism a set with a predicate is determined by the size of the set and the number of elements for which the predicate is true. An automorphism of such a set is given by a permutation of the true elements and a permutation of the false elements. So the total weight is sr1r!(sr)!\sum_s \sum_r \frac{1}{r!(s-r)!}, which one can prove using the binomial theorem is also e2e^2.

An important consequence of this is that the probabilities don't depend on how we describe a given structure. Two equivalent definitions will have the same number of automorphisms and hence the same probabilities. For example we could define a group in terms of just its multiplication operation, satisfying a bunch of axioms including the existence of an identity. Or we could define a group in terms of sets equipped with multiplication, inversion and an identity. Either way a given group will have the same automorphism group and hence the same probability. We can use this to easily calculate that a group of size 66 has a probability 3/43/4 of being abelian, because S3S_3 has 66 automorphisms and C6C_6 only has 22.

view this post on Zulip Oscar Cunningham (Nov 03 2024 at 22:48):

I said I'd say something about formulas with quantifiers in them. The simplest such formula is (x:S)(\forall x:S)\bot. This is true only when SS is empty. From the semantic point of view the probability is easy to calculate: the Pois(1)\mathrm{Pois}(1) distribution gives 00 the probability 1/e1/e , so P((x:S)S:Set)=1/eP((\forall x:S)\bot\|S:\mathrm{Set})=1/e.

From the syntactic point of view things are harder. The first thing to notice is that 1/e1/e is transcendental. So it will be impossible to derive a probability of 1/e1/e by starting with simple axioms and taking finitely many steps ('simple' meaning not mentioning ee directly). I will eventually deal with this, but first let's focus on something easier. I will show that the probability of a set having size n+1n+1 is n+1n+1 times less than the probability that a set has size nn. Again we will use the principle of indifference and the principle that definitions do not change probabilities.

We want to calculate P(card(S)=nS:Set)P(\mathrm{card}(S)=n|S:\mathrm{Set}). The expression 'card(S)=n\mathrm{card}(S)=n' is shorthand for some formula equivalent to the statement that SS has nn elements. (Usually I like to write S|S| for the size of a set, but it's confusing with the vertical bars also being used for conditionals.) We begin by defining a new set TT as the coproduct S+1S+1:

P(card(S)=nS:Set)P(\mathrm{card}(S)=n|S:\mathrm{Set})

=P(card(S)=nS,T:Set,  f:ST,  x:T,  f injective,  (t:T)(t=x(s:S)f(s)=t))=P(\mathrm{card}(S)=n|S,T:\mathrm{Set},\;f:S\to T,\;x:T,\;f\text{ injective},\;(\forall t:T)(t=x\vee (\exists s:S) f(s)=t))

Under this context, card(S)=n\mathrm{card}(S)=n is equivalent to card(T)=n+1\mathrm{card}(T)=n+1.

P(card(S)=nS:Set)P(\mathrm{card}(S)=n|S:\mathrm{Set})

=P(card(T)=n+1S,T:Set,  f:ST,  x:T,  f injective,  (t:T)(t=x(s:S)f(s)=t))=P(\mathrm{card}(T)=n+1|S,T:\mathrm{Set},\;f:S\to T,\;x:T,\;f\text{ injective},\;(\forall t:T)(t=x\vee (\exists s:S) f(s)=t))

Then we note that this is equivalent to a definition of SS as the subset of TT containing the elements not equal to xx.

P(card(S)=nS:Set)=P(card(T)=n+1T:Set,  x:T)P(\mathrm{card}(S)=n|S:\mathrm{Set})=P(\mathrm{card}(T)=n+1|T:\mathrm{Set},\;x:T)

Note that this has left us with an 'x:Tx:T' on the right hand side. Bayes' Theorem lets us bring it to the left.

P(card(S)=nS:Set)=P(x:TT:Set,  card(T)=n+1)P(card(T)=n+1T:Set)P(x:TT:Set)P(\mathrm{card}(S)=n|S:\mathrm{Set})=\frac{P(x:T|T:\mathrm{Set},\;\mathrm{card}(T)=n+1)P(\mathrm{card}(T)=n+1|T:\mathrm{Set})}{P(x:T|T:\mathrm{Set})}

The expected size of a set with cardinality n+1n+1 is n+1n+1, so we have P(x:TT:Set,  card(T)=n+1)=n+1P(x:T|T:\mathrm{Set},\;\mathrm{card}(T)=n+1)=n+1. The denominator P(x:TT:Set)P(x:T|T:\mathrm{Set}), is harder to deal with, so for now let us abbreviate this factor as λ\lambda. So we have derived

P(card(S)=nS:Set)=(n+1)P(card(T)=n+1T:Set)λP(\mathrm{card}(S)=n|S:\mathrm{Set})=\frac{(n+1)P(\mathrm{card}(T)=n+1|T:\mathrm{Set})}{\lambda}

By a recursion to n=0n=0 we have:

P(card(S)=nS:Set)=P(card(S)=0S:Set)λnn!P(\mathrm{card}(S)=n|S:\mathrm{Set})=P(\mathrm{card}(S)=0|S:\mathrm{Set})\frac{\lambda^n}{n!}

A Poisson distribution!

view this post on Zulip Oscar Cunningham (Nov 03 2024 at 22:49):

(Next I will explain how to deal with the fact that 1/e1/e is transcendental, and how to fix λ\lambda as 11.)

view this post on Zulip Matteo Capucci (he/him) (Nov 04 2024 at 09:26):

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

view this post on Zulip Matteo Capucci (he/him) (Nov 04 2024 at 09:29):

The fact such spaces are 'finitely generated' in some sense makes it possible to build this measure inductively, sort of.

view this post on Zulip Matteo Capucci (he/him) (Nov 04 2024 at 09:30):

The fact you can wing higher-order stuff with groupoid cardinality is really fascinating