Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: community: our work

Topic: Oscar Cunningham


view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:09):

I've discovered an interesting way to extend first-order logic to assign 'probabilities' to logical formulas. For example we get the following values:

P(w=xyzS:Set,  w,x,y,z:S)=15P(f injectiveS,T:Set,  f:ST,  f surjective)=1ee2P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S) = \frac{1}{5}\\ P(f\text{ injective}|S,T:\mathrm{Set},\;f:S\to T,\;f\text{ surjective}) = \frac{1}{e^{e-2}}

Has anything like this already been discovered?

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:16):

The idea is to treat conditional probabilities P(AΓ)P(A|\Gamma) as analogous to sequents ΓA\Gamma\vdash A. Irritatingly the conventions change the order of Γ\Gamma and AA. I also have a semantic notion which I'll write as P(AΓ)P(A\|\Gamma) by analogy to ΓA\Gamma\vDash A. One of the interesting things will be that these syntax and semantics agree with each other despite not initially seeming related.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:40):

The syntactic side of things uses the Principle of Indifference, which says that if we don't know anything to distinguish two possibilities, then we should assign to them the same probability. I interpret this formally by a rule that says that α\alpha-equivalent sequents have the same probability.

view this post on Zulip Peva Blanchard (Nov 01 2024 at 15:41):

This reminds of this paper (logical induction), but I never managed to find the time to read it thoroughly.

Also, Solomonoff tried to define "universal prior" in his theory of inductive inference

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:47):

Ah yes, those are definitely related. A logical inductor also assigns probabilities to logical statements, but my understanding is that the probabilities it assigns to undecidable statements can vary depending on arbitrary implementation details, whereas the method I'm developing gives specific values to them. Likewise the exact values given by Solomonoff induction depend on the implementation.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:51):

The other principle that the syntax uses is that introduction of a new symbol via a definition should not change the probabilities. Like if you're playing a game and wondering what card your opponent has, then deciding to call the red cards 'foo' and the black cards 'bar' doesn't change the probability they have the seven of clubs.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 15:51):

Already this allows you to make some deductions.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 16:03):

Let's do a simple example in propositional logic. We want to calculate the probability that some proposition AA is true, given that we know nothing else about it. So we want to find

P(AA:Prop).P(A|A:\mathrm{Prop}).

We can introduce a new proposition BB by defining it as ¬A¬A. Since definitions don't change probabilities we have

P(AA:Prop)=P(AA,B:Prop,  B¬A).P(A|A:\mathrm{Prop})=P(A|A,B:\mathrm{Prop},\; B\leftrightarrow ¬A).

Now since we have B¬AB\leftrightarrow ¬A we can rewrite AA as ¬B¬B. Likewise we can rewrite B¬AB\leftrightarrow ¬A as the logically equivalent A¬BA\leftrightarrow ¬B.

P(AA:Prop)=P(¬BA,B:Prop,  A¬B).P(A|A:\mathrm{Prop})=P(¬B|A,B:\mathrm{Prop},\; A\leftrightarrow ¬B).

Now the right hand side has the form of a definition of AA, which we may remove:

P(AA:Prop)=P(¬BB:Prop).P(A|A:\mathrm{Prop})=P(¬B|B:\mathrm{Prop}).

Finally by α\alpha-equivalence we have

P(AA:Prop)=P(¬AA:Prop).P(A|A:\mathrm{Prop})=P(¬A|A:\mathrm{Prop}).

Separately we have the law of probability

P(AA:Prop)+P(¬AA:Prop)=1.P(A|A:\mathrm{Prop})+P(¬A|A:\mathrm{Prop})=1.

Substituting one into the other yields

P(AA:Prop)+P(AA:Prop)=1P(A|A:\mathrm{Prop})+P(A|A:\mathrm{Prop})=1

from which it follows that P(AA:Prop)=1/2P(A|A:\mathrm{Prop}) = 1/2.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 16:13):

(I hope I've left people wondering how stupid tricks like the above can lead to a proof that the probability that a surjection is also injective is 1/ee21/e^{e-2}. I'll write more later.)

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 21:42):

So having seen that P(AA:Prop)P(A|A:\mathrm{Prop}) is just 1/2, you can probably guess what the semantics are for propositional logic. We just assign a fair coin flip to each atomic proposition and then take the usual probability. So for example P(AA,B:Prop,  AB)=2/3P(A\|A,B:\mathrm{Prop},\;A\vee B)=2/3, since out of the four possible outcomes for two coinflips, three have ABA\vee B, and only two of those have AA.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 21:44):

Using similar arguments as above we can show that the syntactic reasoning gives the same answers, so we have an analogue of the completeness theorem for propositional logic.

view this post on Zulip John Baez (Nov 01 2024 at 21:53):

Cool! When I see 1/ee21/e^{e-2} I think "groupoid cardinality".

The Machine Intelligence Research Institute (Yudkowsky's outfit) had a meeting on trying to assign probabilities to mathematical statements, and I blogged about it here and here. They did not arrive at specific formulas as you are doing. But if you publish something, it'd be worth citing some of their papers, and they might be interested in what you've done.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 22:01):

"groupoid cardinality" :smile: Good intuition!

Yeah, I grew up on LessWrong so I have a good idea of what they're interested in. I also remembered the Progic posts from the n-Category Cafe. It seems that Jon Williamson was also trying to extend logic to probabilities. My plan is to write some things up here as a sounding board and then recompile them more formally as something to link people to. I'm not in academia so I probably won't publish an actual paper.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 22:25):

Moving on to first-order logic, we can consider the example I gave in the first post.

P(w=xyzS:Set,  w,x,y,z:S)P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S)

We're being asked the probability that some variables are equal or not to each other. We'd like to use the Principle of Indifference like we did before. The Principle applies whenever our information is symmetric between two propositions. But we know that == is an equivalence relation whereas \neq is not, so we are not in fact indifferent between equality and its negation. We can't, for example, conclude that w=xw=x has the same probability as wxw\neq x.

The trick is to forget temporarily what we know about equality, and then remember it again. People sometimes work in first-order logic without equality. Then if they decide they actually do want equality then they can just add a new symbol '==' to their language, and add some axioms stating that it's an equivalence relation.

So if we write Pw.e.()P_{\mathrm{w.e.}}(-|-) for probabilities without equality, we can imagine adding a binary relation == with no particular properties. Then the Principle of Indifference does apply, and using a similar argument to what we did in propositional logic we would get that

Pw.e.(w=xS:Set,  w,x:S,  =:S×SProp)=12P_{\mathrm{w.e.}}(w=x|S:\mathrm{Set},\; w,x:S,\;=:S\times S\to\mathrm{Prop}) = \frac{1}{2}

and similarly if we have four variables w,x,y,z:Sw,x,y,z:S we would find that all 24(4+1)/22^{4(4+1)/2} ways that they could behave with respect to == were equally likely.

Then we add back the condition that == is an equivalence relation on SS. We find that of those 1024 relations, only 15 are actually equivalence relations. And of those 15, only 3 satisfy w=xyzw=x\wedge y\neq z. So we have returned to logic-with-equality, and we have

P(w=xyzS:Set,  w,x,y,z:S)=Pw.e.(w=xyzS:Set,  w,x,y,z:S,  =:S×SProp,  = an equivalence relation)=315=15P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S)\\ =P_{\mathrm{w.e.}}(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S,\;=:S\times S\to\mathrm{Prop},\;=\text{ an equivalence relation})\\ =\frac{3}{15} = \frac{1}{5}

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 23:35):

What about the semantic side?

P(w=xyzS:Set,  w,x,y,z:S)P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S)

How do we relate first-order logic to actual random variables?

We can imagine that if we actually knew the size of SS then things would be easier. Suppose SS has a finite cardinality ss, and that ww, xx, yy and zz are chosen from it uniformly at random. Then there's a 1/s1/s chance that w=xw=x and an (s1)/s(s-1)/s chance that yzy\neq z. So overall our chance of w=xyzw=x\wedge y\neq z would be (s1)/s2(s-1)/s^2.

If we had a probability distribution for ss we could then take a weighted average of (s1)/s2(s-1)/s^2 over these probabilities to get our overall answer

sP(s)s1s2\sum_sP(s)\frac{s-1}{s^2}

What probability distribution should we choose? To represent our lack of information about SS, we want something like the 'uniform distribution over all cardinalities'. But sadly there is no uniform distribution on N\mathbb{N}, never mind on all cardinalities.

The solution is to use the weighting from the groupoid cardinality. We let the probability of a set SS be inversely proportional to the size of its automorphism group. So in this case the weight of ss is 1/s!1/s!. I won't explain why groupoid cardinality is a good idea, because there are already articles about it by Terry Tao, Qiaochu Yuan and John Baez. What hope do I have of explaining it better than the holy trinity? Suffice to say that it gives a natural distribution on finite sets which is in a categorical sense 'uniform' while still converging. (And indeed we could apply it to infinite sets. But their automorphism groups are uncountably infinite, so we'll say they have probability 0.)

To get the weights 1/s!1/s! to behave as probabilities, we have to normalise them so that they sum to 11. Since s1/s!=e\sum_s 1/s! = e, we get that the probability of ss is 1es!\frac{1}{es!}.

There's one last thing we need to worry about. The type declaration w,x,y,z:Sw,x,y,z:S on the right hand side of our conditional probability. We are used to finding probabilities of propositions. But w,x,y,z:Sw,x,y,z:S is not a proposition, it is a different kind of judgement: a typing judgement. This hasn't caused us problems so far because it has stayed safely on the right hand side of the conditional, where we are merely conditioning on it rather than actually calculating it's probability. But I want to apply the definition of conditional probability, which brings something from the right to the left of the conditional.

P(AΓ,B)=P(ABΓ)P(BΓ)P(A|\Gamma,B) = \frac{P(A\wedge B|\Gamma)}{P(B|\Gamma)}

Taking a leaf out of the 'propositions as types' playbook, I'm going to say that P(w,x,y,z:SS:Set)P(w,x,y,z:S\|S:\mathrm{Set}) is the expected number of ways to pick ww, xx, yy and zz from SS, i.e. the weighted average of s4s^4. This can be greater than 11, which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions. Likewise P(w=xyz,  w,x,y,z:SS:Set)P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set}) will be the average number of ways to pick ww, xx, yy and zz such that w=xyzw=x\wedge y\neq z is satisfied, i.e. s2(s1)s^2(s-1).

Now we can do our calculation:

P(w=xyzS:Set,  w,x,y,z:S)P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S)

=P(w=xyz,  w,x,y,z:SS:Set)P(w,x,y,z:SS:Set)= \frac{P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set})}{P(w,x,y,z:S\|S:\mathrm{Set})}

=ss2(s1)es!ss4es!=\frac{\sum_s\frac{s^2(s-1)}{es!}}{\sum_s\frac{s^4}{es!}}

Finally we bring out our secret weapon: Dobiński's formula. This says that sskes!\sum_s \frac{s^k}{es!} is BkB_k. The Bell number BkB_k is defined as the number of ways to partition a kk-element set. So here we have

ss3s2es!ss4es!\frac{\sum_s\frac{s^3-s^2}{es!}}{\sum_s\frac{s^4}{es!}}

=B3B2B4=\frac{B_3-B_2}{B_4}

=5215=15=\frac{5-2}{15}=\frac{1}{5}

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 23:37):

Now you might think that I've taken a few dodgy steps. Either on the syntactic side, or the semantic side, or both. But isn't it interesting that the two completely different methods give exactly the same answer? The agreement between these two calculations is what convinced me that there was something significant to be found here.

view this post on Zulip Oscar Cunningham (Nov 01 2024 at 23:39):

(Tomorrow I'll write up assigning probabilities to some formulas with quantifiers in them.)

view this post on Zulip David Corfield (Nov 02 2024 at 07:47):

Oscar Cunningham said:

It seems that Jon Williamson was also trying to extend logic to probabilities.

Just to point out that Jon, an (ex)-colleague at Kent of many years, developed ideas on inductive logic (cf. his book).

view this post on Zulip John Baez (Nov 02 2024 at 17:17):

This is fascinating stuff, @Oscar Cunningham. If you can check that these assignments of probabilities are 'coherent' in the usual sense (e.g. the probably of exhaustive and mutually exclusive events sums to 1), that would be really great. I think that any systematic and elegant way of coherently assigning probabilities to mathematical propositions is interesting - I wouldn't worry too much about what it "really means", or more precisely I hope such worries don't stop anyone from developing such a theory!

Even 'unnormalized probabilities' are interesting:

This can be greater than 1, which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions.

In physics we often use 'unnormalized probabilities', i.e. nonnegative numbers that needn't sum to 1. For example if a system has some finite set of states and the energy of the ii th state is EiE_i, in thermal equilibrium at temperature TT the unnormalized probability of finding the system in the ii th state is

exp(Ei/kT) \exp(-E_i/k T)

where kk is Boltzmann's constant. Unnormalized probabilities add and multiply the way probabilities do - they add for exhaustive and mutually exclusive events, and multiply for independent events. We extract probabilities from them only at the very end of our calculations, by normalizing them.

I've talked a bit to @Tobias Fritz about this: like me he wants to take these unnormalized probabilities quite seriously, in part because it's governed by the math of convex cones, which is better behaved (or at least simpler) than the math of convex sets.

view this post on Zulip John Baez (Nov 02 2024 at 17:23):

Btw, I love Dobinski's formula and explained a proof here, where I used it to answer this puzzle:

Suppose raindrops are falling on your head, randomly and independently, at an average rate of one per minute. What’s the average of the cube of the number of raindrops that fall on your head in one minute?

view this post on Zulip Oscar Cunningham (Nov 03 2024 at 21:12):

I should have said that the strange greater-than-11 probabilities only appear when we take the probability of a type declaration. When we take the probability of a proposition it always comes out to between 00 and 11, even if type declarations were used along the way. Like above when we worked out P(w=xyzS:Set,  w,x,y,z:S)P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S) as

P(w=xyz,  w,x,y,z:SS:Set)P(w,x,y,z:SS:Set) \frac{P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set})}{P(w,x,y,z:S\|S:\mathrm{Set})}

the numerator and denominator were 33 and 1515, both greater than 11, but their quotient was a sane 1/51/5. If AA is some proposition then the probability P(AΓ)P(A\|\Gamma), calculated as P(A,Γ)/P(Γ)P(A,\Gamma)/P(\Gamma), is always less than 11 because the Γ\Gamma-structures satisfying AA form a subset of the set of all Γ\Gamma-structures. Because of this I don't think we ever have to do any normalization. We just have to get used to the fact that a probability conditional on a type judgement is calculated as the ratio of two expected values, rather than two mere probabilities.

Regarding a coherence proof, I hope that this will turn out to be trivial from the semantic point of view, since the values there represent actual probabilities. Then proving it from the syntactic point of view will be a matter of proving a completeness theorem saying that the syntactic and semantic probabilities are the same. The problem will be that from the semantic point of view there can be contexts where the expected values don't converge. Like

P(f(x)=xS:Set,  f:SS,  x:S)=sss1ss!sssss!P(f(x)=x\|S:\mathrm{Set},\;f:S\to S,\;x:S)=\frac{\sum_s\frac{s^{s-1}s}{s!}}{\sum_s\frac{s^{s}s}{s!}}

To prove a completeness theorem I will have to deal with these, or more likely find a way to say something useful despite not dealing with them. More on that later.

view this post on Zulip Oscar Cunningham (Nov 03 2024 at 22:02):

The other thing I should point out is that we can use the reciprocal of the number of automorphisms to work out the probabilities of any structure, not just sets. Suppose we have a structure GG on some set SS with size ss. Then any permutation of SS acts on GG to give us another structure on SS, isomorphic to GG. But if the permutation is actually an automorphism of GG then this structure will just be GG again. So the total number of structures isomorphic to GG is s!/Aut(G)s!/\mathrm{Aut}(G). This multiplies with the 1/s!1/s! factor we have on sets to give 1/Aut(G)1/\mathrm{Aut}(G).

For example, consider sets equipped with a predicate: Γ=(S:Set,  q:SProp)\Gamma=(S:\mathrm{Set},\; q:S\to \mathrm{Prop}). One way to weigh these would be to consider that a set of size ss has 2s2^s possible predicates on it, so the total weight of them over all sets would be s2ss!=e2\sum_s\frac{2^s}{s!}=e^2. The other way to weigh them would be to say that up to isomorphism a set with a predicate is determined by the size of the set and the number of elements for which the predicate is true. An automorphism of such a set is given by a permutation of the true elements and a permutation of the false elements. So the total weight is sr1r!(sr)!\sum_s \sum_r \frac{1}{r!(s-r)!}, which one can prove using the binomial theorem is also e2e^2.

An important consequence of this is that the probabilities don't depend on how we describe a given structure. Two equivalent definitions will have the same number of automorphisms and hence the same probabilities. For example we could define a group in terms of just its multiplication operation, satisfying a bunch of axioms including the existence of an identity. Or we could define a group in terms of sets equipped with multiplication, inversion and an identity. Either way a given group will have the same automorphism group and hence the same probability. We can use this to easily calculate that a group of size 66 has a probability 3/43/4 of being abelian, because S3S_3 has 66 automorphisms and C6C_6 only has 22.

view this post on Zulip Oscar Cunningham (Nov 03 2024 at 22:48):

I said I'd say something about formulas with quantifiers in them. The simplest such formula is (x:S)(\forall x:S)\bot. This is true only when SS is empty. From the semantic point of view the probability is easy to calculate: the Pois(1)\mathrm{Pois}(1) distribution gives 00 the probability 1/e1/e , so P((x:S)S:Set)=1/eP((\forall x:S)\bot\|S:\mathrm{Set})=1/e.

From the syntactic point of view things are harder. The first thing to notice is that 1/e1/e is transcendental. So it will be impossible to derive a probability of 1/e1/e by starting with simple axioms and taking finitely many steps ('simple' meaning not mentioning ee directly). I will eventually deal with this, but first let's focus on something easier. I will show that the probability of a set having size n+1n+1 is n+1n+1 times less than the probability that a set has size nn. Again we will use the principle of indifference and the principle that definitions do not change probabilities.

We want to calculate P(card(S)=nS:Set)P(\mathrm{card}(S)=n|S:\mathrm{Set}). The expression 'card(S)=n\mathrm{card}(S)=n' is shorthand for some formula equivalent to the statement that SS has nn elements. (Usually I like to write S|S| for the size of a set, but it's confusing with the vertical bars also being used for conditionals.) We begin by defining a new set TT as the coproduct S+1S+1:

P(card(S)=nS:Set)P(\mathrm{card}(S)=n|S:\mathrm{Set})

=P(card(S)=nS,T:Set,  f:ST,  x:T,  f injective,  (t:T)(t=x(s:S)f(s)=t))=P(\mathrm{card}(S)=n|S,T:\mathrm{Set},\;f:S\to T,\;x:T,\;f\text{ injective},\;(\forall t:T)(t=x\vee (\exists s:S) f(s)=t))

Under this context, card(S)=n\mathrm{card}(S)=n is equivalent to card(T)=n+1\mathrm{card}(T)=n+1.

P(card(S)=nS:Set)P(\mathrm{card}(S)=n|S:\mathrm{Set})

=P(card(T)=n+1S,T:Set,  f:ST,  x:T,  f injective,  (t:T)(t=x(s:S)f(s)=t))=P(\mathrm{card}(T)=n+1|S,T:\mathrm{Set},\;f:S\to T,\;x:T,\;f\text{ injective},\;(\forall t:T)(t=x\vee (\exists s:S) f(s)=t))

Then we note that this is equivalent to a definition of SS as the subset of TT containing the elements not equal to xx.

P(card(S)=nS:Set)=P(card(T)=n+1T:Set,  x:T)P(\mathrm{card}(S)=n|S:\mathrm{Set})=P(\mathrm{card}(T)=n+1|T:\mathrm{Set},\;x:T)

Note that this has left us with an 'x:Tx:T' on the right hand side. Bayes' Theorem lets us bring it to the left.

P(card(S)=nS:Set)=P(x:TT:Set,  card(T)=n+1)P(card(T)=n+1T:Set)P(x:TT:Set)P(\mathrm{card}(S)=n|S:\mathrm{Set})=\frac{P(x:T|T:\mathrm{Set},\;\mathrm{card}(T)=n+1)P(\mathrm{card}(T)=n+1|T:\mathrm{Set})}{P(x:T|T:\mathrm{Set})}

The expected size of a set with cardinality n+1n+1 is n+1n+1, so we have P(x:TT:Set,  card(T)=n+1)=n+1P(x:T|T:\mathrm{Set},\;\mathrm{card}(T)=n+1)=n+1. The denominator P(x:TT:Set)P(x:T|T:\mathrm{Set}), is harder to deal with, so for now let us abbreviate this factor as λ\lambda. So we have derived

P(card(S)=nS:Set)=(n+1)P(card(T)=n+1T:Set)λP(\mathrm{card}(S)=n|S:\mathrm{Set})=\frac{(n+1)P(\mathrm{card}(T)=n+1|T:\mathrm{Set})}{\lambda}

By a recursion to n=0n=0 we have:

P(card(S)=nS:Set)=P(card(S)=0S:Set)λnn!P(\mathrm{card}(S)=n|S:\mathrm{Set})=P(\mathrm{card}(S)=0|S:\mathrm{Set})\frac{\lambda^n}{n!}

A Poisson distribution!

view this post on Zulip Oscar Cunningham (Nov 03 2024 at 22:49):

(Next I will explain how to deal with the fact that 1/e1/e is transcendental, and how to fix λ\lambda as 11.)

view this post on Zulip Matteo Capucci (he/him) (Nov 04 2024 at 09:26):

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

view this post on Zulip Matteo Capucci (he/him) (Nov 04 2024 at 09:29):

The fact such spaces are 'finitely generated' in some sense makes it possible to build this measure inductively, sort of.

view this post on Zulip Matteo Capucci (he/him) (Nov 04 2024 at 09:30):

The fact you can wing higher-order stuff with groupoid cardinality is really fascinating

view this post on Zulip John Baez (Nov 04 2024 at 16:43):

Matteo Capucci (he/him) said:

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

This is what he's aiming to do. As mentioned, the AI crowd at the Machine Intelligence Research Institute have written papers about such probability measures. The basic idea:

How should a rational agent assign probabilities to statements in mathematics? Of course an omniscient being could assign

probability 1 to every mathematical statement that’s provable,

probability 0 to every statement whose negation is provable,

and

probability ??? to every statement that is neither provable nor disprovable.

But a real-world rational agent will never have time to check all proofs, so there will always be lots of statements it’s not sure about. Actual mathematicians always have conjectures, like the Twin Prime Conjecture, that we consider plausible even though nobody has proved them. And whenever we do research, we’re constantly estimating how likely it is for statements to be true, and changing our estimates as new evidence comes in. In other words, we use scientific induction in mathematics.

How could we automate this? Most of us don’t consciously assign numerical probabilities to mathematical statements. But maybe an AI mathematician should. If so, what rules should it follow?

It's possible that Oscar's work could help here, by providing 'prior probabilities' that we could then update. (I'm actually not sure it does help, but it's a thought.)

view this post on Zulip Morgan Rogers (he/him) (Nov 04 2024 at 16:44):

Oscar Cunningham said:

I'm not in academia so I probably won't publish an actual paper.

I don't see why you shouldn't ;)

view this post on Zulip John Baez (Nov 04 2024 at 16:46):

I was going to say that too. If this stuff actually works, and Oscar can either prove it works or assemble enough evidence to make a plausible conjecture, it really deserves a paper. This stuff looks cool!

I can't tell whether he was saying "since I'm not in academia I don't have time to go through the rigamarole of publishing a paper" or "since I'm not in academia probably no decent journal would publish a paper if I wrote one" - or both. If the latter, I hope that's not really true.

view this post on Zulip Morgan Rogers (he/him) (Nov 04 2024 at 16:47):

Do probability 1 statements necessarily correspond to tautologies @Oscar Cunningham ? If so, that sounds like a solid foundation for some kind of "statistical proof theory"

view this post on Zulip Alex Kreitzberg (Nov 04 2024 at 19:12):

But if you had some complicated statement that incidentally was a tautology, but wasn't obviously so, shouldn't the formalism not be able to assign a probability (edit: probability of value 1) to the statement?

Like in their example P(AA:Prop)=12P(A|A : \text{Prop}) = \frac{1}{2}, shouldn't some tautology be a possible value of AA, and this theorem then abstracts over this possibility?

Maybe I'm confused, is the idea that a statement has probability one after you establish its truth as a theorem, which tautologies are supposed to encode? I'm not very good at logic.

I really like the idea of preserving probability under α\alpha equivalence and definition introductions.

view this post on Zulip John Baez (Nov 04 2024 at 19:17):

Given the soundness and completeness of classical logic, 'theorems' are the same as 'tautologies'. So don't worry about the difference too much now, though it takes work to prove they're the same.

The hard problem is getting a reasonable way of assigning probabilities to statements that doesn't assign probability 1 to every theorem (i.e. every tautology). How do you decide which statements are tautologies 'but not obviously so'?

I'm not saying this is unsolvable problem - far from it! I just haven't seen anyone solve it.

I don't think there should be a unique way to assign probabilities to statements. The way we assign probabilities can depend on our current knowledge. To turn this into math, we'd have to formalize the idea of a collection of 'known' statements. Maybe we should allow this to be an arbitrary set of statements. Once you start demanding that this set be closed under deduction rules, it tends to become infinite.

In simple terms: if we assume that you know all the consequences of what you know, then you know an infinite amount of stuff. So that's not realistic.

view this post on Zulip John Baez (Nov 04 2024 at 19:28):

But Oscar's ideas are interesting because they seem to avoid the issue of formalizing what's 'known' or 'obvious'.

view this post on Zulip Alex Kreitzberg (Nov 04 2024 at 19:36):

Yeah that makes sense to me.

It seems like a solution would be like assigning a (probability) mass to all the points of a space, and it turns out the "space" in this context are sentences of some formal language - that necessarily includes sentences we have reason to believe are true. Which if it's first order logic, would include all the theorems.

You could know a theorem was true, but maybe you wanted to use this formalism to express a student's current understanding, so you'd still want to be able to assign a probability less than one to the theorem the student didn't understand yet. Then you could try to reason out what the student would think was likely, with the usual laws of probability.

view this post on Zulip Oscar Cunningham (Nov 23 2024 at 10:19):

Been busy for a couple of weeks, but I've got some time now.

view this post on Zulip Oscar Cunningham (Nov 23 2024 at 10:22):

Matteo Capucci (he/him) said:

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

Yes, except to be precise I wouldn't say that I was constructing a probability measure on the space of formulae, because the formulae aren't the individual elements of the measure space but rather correspond to its subsets, the 'events'. The thing that I'm putting the measure on is the set of models, or from a syntactic point of view the set of complete theories.

view this post on Zulip Oscar Cunningham (Nov 23 2024 at 10:28):

Morgan Rogers (he/him) said:

Oscar Cunningham said:

I'm not in academia so I probably won't publish an actual paper.

I don't see why you shouldn't ;)

John Baez said:

I was going to say that too. If this stuff actually works, and Oscar can either prove it works or assemble enough evidence to make a plausible conjecture, it really deserves a paper. This stuff looks cool!

I can't tell whether he was saying "since I'm not in academia I don't have time to go through the rigamarole of publishing a paper" or "since I'm not in academia probably no decent journal would publish a paper if I wrote one" - or both. If the latter, I hope that's not really true.

It's a combination of these things. I certainly don't have much free time, as you can tell from the fact it's been two weeks since I last posted on here. The other factor is incentives. Since I'm not an academic there's not necessarily a professional advantage to having these ideas published in a paper. If I just want people to know about them then a blog post will serve equally well. It might be fun to have a paper, but I'm not sure it's worth the additional hours of chasing down references, battling with LaTeX, responding to reviewers, etc..

view this post on Zulip Oscar Cunningham (Nov 23 2024 at 10:32):

Morgan Rogers (he/him) said:

Do probability 1 statements necessarily correspond to tautologies Oscar Cunningham ? If so, that sounds like a solid foundation for some kind of "statistical proof theory"

That would be nice, but at the moment I don't think it will work that way. It's like when I throw a dart at a dartboard, any singleton set has area 00, so the event 'hit the exact centre' has probability 00, but it's not literally impossible to hit the centre. I have to hit some point. Likewise we might find formulae that have probability 11 but which aren't tautologies. Definitely all tautologies should have probability 11.

view this post on Zulip Oscar Cunningham (Nov 23 2024 at 10:42):

Alex Kreitzberg said:

But if you had some complicated statement that incidentally was a tautology, but wasn't obviously so, shouldn't the formalism not be able to assign a probability (edit: probability of value 1) to the statement?

I'm not trying to do the same thing as MIRI's Logical Inductors paper, where they have probabilities that slowly update as you do more calculations. In their formalism P(1020th digit of π is a 7)P(10^{20}\text{th digit of }\pi\text{ is a }7) is close to 1/101/10 to begin with, and then changes to 00 or 11 after you've done enough calculation to determine what the digit is.

I'm not worrying about how long it takes to do calculations, but instead assigning probabilities to formulas that are undecidable. For example given only that $$x$ and yy are elements of some set SS, it's undecidable whether or not x=yx=y. No amount of further calculation will tell you whether they're equal or not. Nevertheless I claim that we can say that given this state of knowledge it's correct to assign the formula 'x=yx=y' a probability of 1/21/2.

view this post on Zulip Oscar Cunningham (Nov 23 2024 at 10:48):

My system will assign all theorems probability 11, all negations of theorems probability 00, and assign values between 00 and 11 to some undecidable statements. There will necessarily be other undecidable statements that can't be assigned a probability at all, otherwise we would be able to contravene Gödel's Theorems. (If we could algorithmically assign a coherent probability to every sentence then we could form a complete extension of PA\sf{PA} by considering all statements in order and calculating their probability given our axioms, adding a new statement to our axioms whenever we found that its probability was greater than 00.) I don't think we'll ever be able to assign a probability to the statement that VV obeys the Axiom of Choice, given that VV is a model of ZFC\sf{ZFC}.

view this post on Zulip Peva Blanchard (Nov 23 2024 at 11:42):

This is interesting. I wonder what is the class of statements to which your method can assign a probability value.

view this post on Zulip John Baez (Nov 23 2024 at 23:08):

Oscar wrote:

If I just want people to know about them then a blog post will serve equally well.

It's not really equally good, though it may be good enough. I put lots of ideas on blogs, and it seems rather few people notice them until I publish them or at least put them on the arXiv. In fact there are a couple of papers I want to write, to deal with this problem....

view this post on Zulip Mike Shulman (Nov 24 2024 at 07:52):

Another problem with blog posts is that they aren't subject to peer review, so even people who notice them won't have the same level of trust in them. So if you want people to cite and build on your work, I do think a traditional publication is more effective.

view this post on Zulip Morgan Rogers (he/him) (Nov 24 2024 at 15:01):

You don't have to write it by yourself either. Even if it doesn't advance your career, letting a grad student participate in the writing would be good for them.

view this post on Zulip John Baez (Nov 24 2024 at 18:40):

Indeed this project should be on the list of open problems we're talking about elsewhere. But a blog article could help publicize it.

view this post on Zulip Matteo Capucci (he/him) (Nov 26 2024 at 12:04):

Oscar Cunningham said:

Matteo Capucci (he/him) said:

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

Yes, except to be precise I wouldn't say that I was constructing a probability measure on the space of formulae, because the formulae aren't the individual elements of the measure space but rather correspond to its subsets, the 'events'. The thing that I'm putting the measure on is the set of models, or from a syntactic point of view the set of complete theories.

I see. So you're fixing a language LL (or maybe a theory TT?), and then P(φ)P(\varphi) should correspond to the fraction of models MMod(T)M \in {\bf Mod}(T) that satisfy φ\varphi?