Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: community: our work

Topic: Oscar Cunningham

Oscar Cunningham (Nov 01 2024 at 15:09):

I've discovered an interesting way to extend first-order logic to assign 'probabilities' to logical formulas. For example we get the following values:

$P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S) = \frac{1}{5}\\ P(f\text{ injective}|S,T:\mathrm{Set},\;f:S\to T,\;f\text{ surjective}) = \frac{1}{e^{e-2}}$

Has anything like this already been discovered?

Oscar Cunningham (Nov 01 2024 at 15:16):

The idea is to treat conditional probabilities $P(A|\Gamma)$ as analogous to sequents $\Gamma\vdash A$ . Irritatingly the conventions change the order of $\Gamma$ and $A$ . I also have a semantic notion which I'll write as $P(A\|\Gamma)$ by analogy to $\Gamma\vDash A$ . One of the interesting things will be that these syntax and semantics agree with each other despite not initially seeming related.

Oscar Cunningham (Nov 01 2024 at 15:40):

The syntactic side of things uses the Principle of Indifference, which says that if we don't know anything to distinguish two possibilities, then we should assign to them the same probability. I interpret this formally by a rule that says that $\alpha$ -equivalent sequents have the same probability.

Peva Blanchard (Nov 01 2024 at 15:41):

This reminds of this paper (logical induction), but I never managed to find the time to read it thoroughly.

Also, Solomonoff tried to define "universal prior" in his theory of inductive inference

Oscar Cunningham (Nov 01 2024 at 15:47):

Ah yes, those are definitely related. A logical inductor also assigns probabilities to logical statements, but my understanding is that the probabilities it assigns to undecidable statements can vary depending on arbitrary implementation details, whereas the method I'm developing gives specific values to them. Likewise the exact values given by Solomonoff induction depend on the implementation.

Oscar Cunningham (Nov 01 2024 at 15:51):

The other principle that the syntax uses is that introduction of a new symbol via a definition should not change the probabilities. Like if you're playing a game and wondering what card your opponent has, then deciding to call the red cards 'foo' and the black cards 'bar' doesn't change the probability they have the seven of clubs.

Oscar Cunningham (Nov 01 2024 at 15:51):

Already this allows you to make some deductions.

Oscar Cunningham (Nov 01 2024 at 16:03):

Let's do a simple example in propositional logic. We want to calculate the probability that some proposition $A$ is true, given that we know nothing else about it. So we want to find

$P(A|A:\mathrm{Prop}).$

We can introduce a new proposition $B$ by defining it as $¬A$ . Since definitions don't change probabilities we have

$P(A|A:\mathrm{Prop})=P(A|A,B:\mathrm{Prop},\; B\leftrightarrow ¬A).$

Now since we have $B\leftrightarrow ¬A$ we can rewrite $A$ as $¬B$ . Likewise we can rewrite $B\leftrightarrow ¬A$ as the logically equivalent $A\leftrightarrow ¬B$ .

$P(A|A:\mathrm{Prop})=P(¬B|A,B:\mathrm{Prop},\; A\leftrightarrow ¬B).$

Now the right hand side has the form of a definition of $A$ , which we may remove:

$P(A|A:\mathrm{Prop})=P(¬B|B:\mathrm{Prop}).$

Finally by $\alpha$ -equivalence we have

$P(A|A:\mathrm{Prop})=P(¬A|A:\mathrm{Prop}).$

Separately we have the law of probability

$P(A|A:\mathrm{Prop})+P(¬A|A:\mathrm{Prop})=1.$

Substituting one into the other yields

$P(A|A:\mathrm{Prop})+P(A|A:\mathrm{Prop})=1$

from which it follows that $P(A|A:\mathrm{Prop}) = 1/2$ .

Oscar Cunningham (Nov 01 2024 at 16:13):

(I hope I've left people wondering how stupid tricks like the above can lead to a proof that the probability that a surjection is also injective is $1/e^{e-2}$ . I'll write more later.)

Oscar Cunningham (Nov 01 2024 at 21:42):

So having seen that $P(A|A:\mathrm{Prop})$ is just 1/2, you can probably guess what the semantics are for propositional logic. We just assign a fair coin flip to each atomic proposition and then take the usual probability. So for example $P(A\|A,B:\mathrm{Prop},\;A\vee B)=2/3$ , since out of the four possible outcomes for two coinflips, three have $A\vee B$ , and only two of those have $A$ .

Oscar Cunningham (Nov 01 2024 at 21:44):

Using similar arguments as above we can show that the syntactic reasoning gives the same answers, so we have an analogue of the completeness theorem for propositional logic.

John Baez (Nov 01 2024 at 21:53):

Cool! When I see $1/e^{e-2}$ I think "groupoid cardinality".

The Machine Intelligence Research Institute (Yudkowsky's outfit) had a meeting on trying to assign probabilities to mathematical statements, and I blogged about it here and here. They did not arrive at specific formulas as you are doing. But if you publish something, it'd be worth citing some of their papers, and they might be interested in what you've done.

Oscar Cunningham (Nov 01 2024 at 22:01):

"groupoid cardinality" :smile: Good intuition!

Yeah, I grew up on LessWrong so I have a good idea of what they're interested in. I also remembered the Progic posts from the n-Category Cafe. It seems that Jon Williamson was also trying to extend logic to probabilities. My plan is to write some things up here as a sounding board and then recompile them more formally as something to link people to. I'm not in academia so I probably won't publish an actual paper.

Oscar Cunningham (Nov 01 2024 at 22:25):

Moving on to first-order logic, we can consider the example I gave in the first post.

$P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S)$

We're being asked the probability that some variables are equal or not to each other. We'd like to use the Principle of Indifference like we did before. The Principle applies whenever our information is symmetric between two propositions. But we know that $=$ is an equivalence relation whereas $\neq$ is not, so we are not in fact indifferent between equality and its negation. We can't, for example, conclude that $w=x$ has the same probability as $w\neq x$ .

The trick is to forget temporarily what we know about equality, and then remember it again. People sometimes work in first-order logic without equality. Then if they decide they actually do want equality then they can just add a new symbol ' $=$ ' to their language, and add some axioms stating that it's an equivalence relation.

So if we write $P_{\mathrm{w.e.}}(-|-)$ for probabilities without equality, we can imagine adding a binary relation $=$ with no particular properties. Then the Principle of Indifference does apply, and using a similar argument to what we did in propositional logic we would get that

$P_{\mathrm{w.e.}}(w=x|S:\mathrm{Set},\; w,x:S,\;=:S\times S\to\mathrm{Prop}) = \frac{1}{2}$

and similarly if we have four variables $w,x,y,z:S$ we would find that all $2^{4(4+1)/2}$ ways that they could behave with respect to $=$ were equally likely.

Then we add back the condition that $=$ is an equivalence relation on $S$ . We find that of those 1024 relations, only 15 are actually equivalence relations. And of those 15, only 3 satisfy $w=x\wedge y\neq z$ . So we have returned to logic-with-equality, and we have

$P(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S)\\ =P_{\mathrm{w.e.}}(w=x\wedge y\neq z|S:\mathrm{Set},\; w,x,y,z:S,\;=:S\times S\to\mathrm{Prop},\;=\text{ an equivalence relation})\\ =\frac{3}{15} = \frac{1}{5}$

Oscar Cunningham (Nov 01 2024 at 23:35):

What about the semantic side?

$P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S)$

How do we relate first-order logic to actual random variables?

We can imagine that if we actually knew the size of $S$ then things would be easier. Suppose $S$ has a finite cardinality $s$ , and that $w$ , $x$ , $y$ and $z$ are chosen from it uniformly at random. Then there's a $1/s$ chance that $w=x$ and an $(s-1)/s$ chance that $y\neq z$ . So overall our chance of $w=x\wedge y\neq z$ would be $(s-1)/s^2$ .

If we had a probability distribution for $s$ we could then take a weighted average of $(s-1)/s^2$ over these probabilities to get our overall answer

$\sum_sP(s)\frac{s-1}{s^2}$

What probability distribution should we choose? To represent our lack of information about $S$ , we want something like the 'uniform distribution over all cardinalities'. But sadly there is no uniform distribution on $\mathbb{N}$ , never mind on all cardinalities.

The solution is to use the weighting from the groupoid cardinality. We let the probability of a set $S$ be inversely proportional to the size of its automorphism group. So in this case the weight of $s$ is $1/s!$ . I won't explain why groupoid cardinality is a good idea, because there are already articles about it by Terry Tao, Qiaochu Yuan and John Baez. What hope do I have of explaining it better than the holy trinity? Suffice to say that it gives a natural distribution on finite sets which is in a categorical sense 'uniform' while still converging. (And indeed we could apply it to infinite sets. But their automorphism groups are uncountably infinite, so we'll say they have probability 0.)

To get the weights $1/s!$ to behave as probabilities, we have to normalise them so that they sum to $1$ . Since $\sum_s 1/s! = e$ , we get that the probability of $s$ is $\frac{1}{es!}$ .

There's one last thing we need to worry about. The type declaration $w,x,y,z:S$ on the right hand side of our conditional probability. We are used to finding probabilities of propositions. But $w,x,y,z:S$ is not a proposition, it is a different kind of judgement: a typing judgement. This hasn't caused us problems so far because it has stayed safely on the right hand side of the conditional, where we are merely conditioning on it rather than actually calculating it's probability. But I want to apply the definition of conditional probability, which brings something from the right to the left of the conditional.

$P(A|\Gamma,B) = \frac{P(A\wedge B|\Gamma)}{P(B|\Gamma)}$

Taking a leaf out of the 'propositions as types' playbook, I'm going to say that $P(w,x,y,z:S\|S:\mathrm{Set})$ is the expected number of ways to pick $w$ , $x$ , $y$ and $z$ from $S$ , i.e. the weighted average of $s^4$ . This can be greater than $1$ , which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions. Likewise $P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set})$ will be the average number of ways to pick $w$ , $x$ , $y$ and $z$ such that $w=x\wedge y\neq z$ is satisfied, i.e. $s^2(s-1)$ .

Now we can do our calculation:

$P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S)$

$= \frac{P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set})}{P(w,x,y,z:S\|S:\mathrm{Set})}$

$=\frac{\sum_s\frac{s^2(s-1)}{es!}}{\sum_s\frac{s^4}{es!}}$

Finally we bring out our secret weapon: Dobiński's formula. This says that $\sum_s \frac{s^k}{es!}$ is $B_k$ . The Bell number $B_k$ is defined as the number of ways to partition a $k$ -element set. So here we have

$\frac{\sum_s\frac{s^3-s^2}{es!}}{\sum_s\frac{s^4}{es!}}$

$=\frac{B_3-B_2}{B_4}$

$=\frac{5-2}{15}=\frac{1}{5}$

Oscar Cunningham (Nov 01 2024 at 23:37):

Now you might think that I've taken a few dodgy steps. Either on the syntactic side, or the semantic side, or both. But isn't it interesting that the two completely different methods give exactly the same answer? The agreement between these two calculations is what convinced me that there was something significant to be found here.

Oscar Cunningham (Nov 01 2024 at 23:39):

(Tomorrow I'll write up assigning probabilities to some formulas with quantifiers in them.)

David Corfield (Nov 02 2024 at 07:47):

Oscar Cunningham said:

It seems that Jon Williamson was also trying to extend logic to probabilities.

Just to point out that Jon, an (ex)-colleague at Kent of many years, developed ideas on inductive logic (cf. his book).

John Baez (Nov 02 2024 at 17:17):

This is fascinating stuff, @Oscar Cunningham. If you can check that these assignments of probabilities are 'coherent' in the usual sense (e.g. the probably of exhaustive and mutually exclusive events sums to 1), that would be really great. I think that any systematic and elegant way of coherently assigning probabilities to mathematical propositions is interesting - I wouldn't worry too much about what it "really means", or more precisely I hope such worries don't stop anyone from developing such a theory!

Even 'unnormalized probabilities' are interesting:

This can be greater than 1, which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions.

In physics we often use 'unnormalized probabilities', i.e. nonnegative numbers that needn't sum to 1. For example if a system has some finite set of states and the energy of the $i$ th state is $E_i$ , in thermal equilibrium at temperature $T$ the unnormalized probability of finding the system in the $i$ th state is

$\exp(-E_i/k T)$

where $k$ is Boltzmann's constant. Unnormalized probabilities add and multiply the way probabilities do - they add for exhaustive and mutually exclusive events, and multiply for independent events. We extract probabilities from them only at the very end of our calculations, by normalizing them.

I've talked a bit to @Tobias Fritz about this: like me he wants to take these unnormalized probabilities quite seriously, in part because it's governed by the math of convex cones, which is better behaved (or at least simpler) than the math of convex sets.

John Baez (Nov 02 2024 at 17:23):

Btw, I love Dobinski's formula and explained a proof here, where I used it to answer this puzzle:

Suppose raindrops are falling on your head, randomly and independently, at an average rate of one per minute. What’s the average of the cube of the number of raindrops that fall on your head in one minute?

Oscar Cunningham (Nov 03 2024 at 21:12):

I should have said that the strange greater-than- $1$ probabilities only appear when we take the probability of a type declaration. When we take the probability of a proposition it always comes out to between $0$ and $1$ , even if type declarations were used along the way. Like above when we worked out $P(w=x\wedge y\neq z\|S:\mathrm{Set},\; w,x,y,z:S)$ as

$\frac{P(w=x\wedge y\neq z,\;w,x,y,z:S\|S:\mathrm{Set})}{P(w,x,y,z:S\|S:\mathrm{Set})}$

the numerator and denominator were $3$ and $15$ , both greater than $1$ , but their quotient was a sane $1/5$ . If $A$ is some proposition then the probability $P(A\|\Gamma)$ , calculated as $P(A,\Gamma)/P(\Gamma)$ , is always less than $1$ because the $\Gamma$ -structures satisfying $A$ form a subset of the set of all $\Gamma$ -structures. Because of this I don't think we ever have to do any normalization. We just have to get used to the fact that a probability conditional on a type judgement is calculated as the ratio of two expected values, rather than two mere probabilities.

Regarding a coherence proof, I hope that this will turn out to be trivial from the semantic point of view, since the values there represent actual probabilities. Then proving it from the syntactic point of view will be a matter of proving a completeness theorem saying that the syntactic and semantic probabilities are the same. The problem will be that from the semantic point of view there can be contexts where the expected values don't converge. Like

$P(f(x)=x\|S:\mathrm{Set},\;f:S\to S,\;x:S)=\frac{\sum_s\frac{s^{s-1}s}{s!}}{\sum_s\frac{s^{s}s}{s!}}$

To prove a completeness theorem I will have to deal with these, or more likely find a way to say something useful despite not dealing with them. More on that later.

Oscar Cunningham (Nov 03 2024 at 22:02):

The other thing I should point out is that we can use the reciprocal of the number of automorphisms to work out the probabilities of any structure, not just sets. Suppose we have a structure $G$ on some set $S$ with size $s$ . Then any permutation of $S$ acts on $G$ to give us another structure on $S$ , isomorphic to $G$ . But if the permutation is actually an automorphism of $G$ then this structure will just be $G$ again. So the total number of structures isomorphic to $G$ is $s!/\mathrm{Aut}(G)$ . This multiplies with the $1/s!$ factor we have on sets to give $1/\mathrm{Aut}(G)$ .

For example, consider sets equipped with a predicate: $\Gamma=(S:\mathrm{Set},\; q:S\to \mathrm{Prop})$ . One way to weigh these would be to consider that a set of size $s$ has $2^s$ possible predicates on it, so the total weight of them over all sets would be $\sum_s\frac{2^s}{s!}=e^2$ . The other way to weigh them would be to say that up to isomorphism a set with a predicate is determined by the size of the set and the number of elements for which the predicate is true. An automorphism of such a set is given by a permutation of the true elements and a permutation of the false elements. So the total weight is $\sum_s \sum_r \frac{1}{r!(s-r)!}$ , which one can prove using the binomial theorem is also $e^2$ .

An important consequence of this is that the probabilities don't depend on how we describe a given structure. Two equivalent definitions will have the same number of automorphisms and hence the same probabilities. For example we could define a group in terms of just its multiplication operation, satisfying a bunch of axioms including the existence of an identity. Or we could define a group in terms of sets equipped with multiplication, inversion and an identity. Either way a given group will have the same automorphism group and hence the same probability. We can use this to easily calculate that a group of size $6$ has a probability $3/4$ of being abelian, because $S_3$ has $6$ automorphisms and $C_6$ only has $2$ .

Oscar Cunningham (Nov 03 2024 at 22:48):

I said I'd say something about formulas with quantifiers in them. The simplest such formula is $(\forall x:S)\bot$ . This is true only when $S$ is empty. From the semantic point of view the probability is easy to calculate: the $\mathrm{Pois}(1)$ distribution gives $0$ the probability $1/e$ , so $P((\forall x:S)\bot\|S:\mathrm{Set})=1/e$ .

From the syntactic point of view things are harder. The first thing to notice is that $1/e$ is transcendental. So it will be impossible to derive a probability of $1/e$ by starting with simple axioms and taking finitely many steps ('simple' meaning not mentioning $e$ directly). I will eventually deal with this, but first let's focus on something easier. I will show that the probability of a set having size $n+1$ is $n+1$ times less than the probability that a set has size $n$ . Again we will use the principle of indifference and the principle that definitions do not change probabilities.

We want to calculate $P(\mathrm{card}(S)=n|S:\mathrm{Set})$ . The expression ' $\mathrm{card}(S)=n$ ' is shorthand for some formula equivalent to the statement that $S$ has $n$ elements. (Usually I like to write $|S|$ for the size of a set, but it's confusing with the vertical bars also being used for conditionals.) We begin by defining a new set $T$ as the coproduct $S+1$ :

$P(\mathrm{card}(S)=n|S:\mathrm{Set})$

$=P(\mathrm{card}(S)=n|S,T:\mathrm{Set},\;f:S\to T,\;x:T,\;f\text{ injective},\;(\forall t:T)(t=x\vee (\exists s:S) f(s)=t))$

Under this context, $\mathrm{card}(S)=n$ is equivalent to $\mathrm{card}(T)=n+1$ .

$P(\mathrm{card}(S)=n|S:\mathrm{Set})$

$=P(\mathrm{card}(T)=n+1|S,T:\mathrm{Set},\;f:S\to T,\;x:T,\;f\text{ injective},\;(\forall t:T)(t=x\vee (\exists s:S) f(s)=t))$

Then we note that this is equivalent to a definition of $S$ as the subset of $T$ containing the elements not equal to $x$ .

$P(\mathrm{card}(S)=n|S:\mathrm{Set})=P(\mathrm{card}(T)=n+1|T:\mathrm{Set},\;x:T)$

Note that this has left us with an ' $x:T$ ' on the right hand side. Bayes' Theorem lets us bring it to the left.

$P(\mathrm{card}(S)=n|S:\mathrm{Set})=\frac{P(x:T|T:\mathrm{Set},\;\mathrm{card}(T)=n+1)P(\mathrm{card}(T)=n+1|T:\mathrm{Set})}{P(x:T|T:\mathrm{Set})}$

The expected size of a set with cardinality $n+1$ is $n+1$ , so we have $P(x:T|T:\mathrm{Set},\;\mathrm{card}(T)=n+1)=n+1$ . The denominator $P(x:T|T:\mathrm{Set})$ , is harder to deal with, so for now let us abbreviate this factor as $\lambda$ . So we have derived

$P(\mathrm{card}(S)=n|S:\mathrm{Set})=\frac{(n+1)P(\mathrm{card}(T)=n+1|T:\mathrm{Set})}{\lambda}$

By a recursion to $n=0$ we have:

$P(\mathrm{card}(S)=n|S:\mathrm{Set})=P(\mathrm{card}(S)=0|S:\mathrm{Set})\frac{\lambda^n}{n!}$

A Poisson distribution!

Oscar Cunningham (Nov 03 2024 at 22:49):

(Next I will explain how to deal with the fact that $1/e$ is transcendental, and how to fix $\lambda$ as $1$ .)

Matteo Capucci (he/him) (Nov 04 2024 at 09:26):

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

Matteo Capucci (he/him) (Nov 04 2024 at 09:29):

The fact such spaces are 'finitely generated' in some sense makes it possible to build this measure inductively, sort of.

Matteo Capucci (he/him) (Nov 04 2024 at 09:30):

The fact you can wing higher-order stuff with groupoid cardinality is really fascinating

John Baez (Nov 04 2024 at 16:43):

Matteo Capucci (he/him) said:

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

This is what he's aiming to do. As mentioned, the AI crowd at the Machine Intelligence Research Institute have written papers about such probability measures. The basic idea:

How should a rational agent assign probabilities to statements in mathematics? Of course an omniscient being could assign

• probability 1 to every mathematical statement that’s provable,

• probability 0 to every statement whose negation is provable,

and

• probability ??? to every statement that is neither provable nor disprovable.

But a real-world rational agent will never have time to check all proofs, so there will always be lots of statements it’s not sure about. Actual mathematicians always have conjectures, like the Twin Prime Conjecture, that we consider plausible even though nobody has proved them. And whenever we do research, we’re constantly estimating how likely it is for statements to be true, and changing our estimates as new evidence comes in. In other words, we use scientific induction in mathematics.

How could we automate this? Most of us don’t consciously assign numerical probabilities to mathematical statements. But maybe an AI mathematician should. If so, what rules should it follow?

It's possible that Oscar's work could help here, by providing 'prior probabilities' that we could then update. (I'm actually not sure it does help, but it's a thought.)

Morgan Rogers (he/him) (Nov 04 2024 at 16:44):

Oscar Cunningham said:

I'm not in academia so I probably won't publish an actual paper.

I don't see why you shouldn't ;)

John Baez (Nov 04 2024 at 16:46):

I was going to say that too. If this stuff actually works, and Oscar can either prove it works or assemble enough evidence to make a plausible conjecture, it really deserves a paper. This stuff looks cool!

I can't tell whether he was saying "since I'm not in academia I don't have time to go through the rigamarole of publishing a paper" or "since I'm not in academia probably no decent journal would publish a paper if I wrote one" - or both. If the latter, I hope that's not really true.

Morgan Rogers (he/him) (Nov 04 2024 at 16:47):

Do probability 1 statements necessarily correspond to tautologies @Oscar Cunningham ? If so, that sounds like a solid foundation for some kind of "statistical proof theory"

Alex Kreitzberg (Nov 04 2024 at 19:12):

But if you had some complicated statement that incidentally was a tautology, but wasn't obviously so, shouldn't the formalism not be able to assign a probability (edit: probability of value 1) to the statement?

Like in their example $P(A|A : \text{Prop}) = \frac{1}{2}$ , shouldn't some tautology be a possible value of $A$ , and this theorem then abstracts over this possibility?

Maybe I'm confused, is the idea that a statement has probability one after you establish its truth as a theorem, which tautologies are supposed to encode? I'm not very good at logic.

I really like the idea of preserving probability under $\alpha$ equivalence and definition introductions.

John Baez (Nov 04 2024 at 19:17):

Given the soundness and completeness of classical logic, 'theorems' are the same as 'tautologies'. So don't worry about the difference too much now, though it takes work to prove they're the same.

The hard problem is getting a reasonable way of assigning probabilities to statements that doesn't assign probability 1 to every theorem (i.e. every tautology). How do you decide which statements are tautologies 'but not obviously so'?

I'm not saying this is unsolvable problem - far from it! I just haven't seen anyone solve it.

I don't think there should be a unique way to assign probabilities to statements. The way we assign probabilities can depend on our current knowledge. To turn this into math, we'd have to formalize the idea of a collection of 'known' statements. Maybe we should allow this to be an arbitrary set of statements. Once you start demanding that this set be closed under deduction rules, it tends to become infinite.

In simple terms: if we assume that you know all the consequences of what you know, then you know an infinite amount of stuff. So that's not realistic.

John Baez (Nov 04 2024 at 19:28):

But Oscar's ideas are interesting because they seem to avoid the issue of formalizing what's 'known' or 'obvious'.

Alex Kreitzberg (Nov 04 2024 at 19:36):

Yeah that makes sense to me.

It seems like a solution would be like assigning a (probability) mass to all the points of a space, and it turns out the "space" in this context are sentences of some formal language - that necessarily includes sentences we have reason to believe are true. Which if it's first order logic, would include all the theorems.

You could know a theorem was true, but maybe you wanted to use this formalism to express a student's current understanding, so you'd still want to be able to assign a probability less than one to the theorem the student didn't understand yet. Then you could try to reason out what the student would think was likely, with the usual laws of probability.

Oscar Cunningham (Nov 23 2024 at 10:19):

Been busy for a couple of weeks, but I've got some time now.

Oscar Cunningham (Nov 23 2024 at 10:22):

Matteo Capucci (he/him) said:

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

Yes, except to be precise I wouldn't say that I was constructing a probability measure on the space of formulae, because the formulae aren't the individual elements of the measure space but rather correspond to its subsets, the 'events'. The thing that I'm putting the measure on is the set of models, or from a syntactic point of view the set of complete theories.

Oscar Cunningham (Nov 23 2024 at 10:28):

Morgan Rogers (he/him) said:

Oscar Cunningham said:

I'm not in academia so I probably won't publish an actual paper.

I don't see why you shouldn't ;)

John Baez said:

I was going to say that too. If this stuff actually works, and Oscar can either prove it works or assemble enough evidence to make a plausible conjecture, it really deserves a paper. This stuff looks cool!

I can't tell whether he was saying "since I'm not in academia I don't have time to go through the rigamarole of publishing a paper" or "since I'm not in academia probably no decent journal would publish a paper if I wrote one" - or both. If the latter, I hope that's not really true.

It's a combination of these things. I certainly don't have much free time, as you can tell from the fact it's been two weeks since I last posted on here. The other factor is incentives. Since I'm not an academic there's not necessarily a professional advantage to having these ideas published in a paper. If I just want people to know about them then a blog post will serve equally well. It might be fun to have a paper, but I'm not sure it's worth the additional hours of chasing down references, battling with LaTeX, responding to reviewers, etc..

Oscar Cunningham (Nov 23 2024 at 10:32):

Morgan Rogers (he/him) said:

Do probability 1 statements necessarily correspond to tautologies Oscar Cunningham ? If so, that sounds like a solid foundation for some kind of "statistical proof theory"

That would be nice, but at the moment I don't think it will work that way. It's like when I throw a dart at a dartboard, any singleton set has area $0$ , so the event 'hit the exact centre' has probability $0$ , but it's not literally impossible to hit the centre. I have to hit some point. Likewise we might find formulae that have probability $1$ but which aren't tautologies. Definitely all tautologies should have probability $1$ .

Oscar Cunningham (Nov 23 2024 at 10:42):

Alex Kreitzberg said:

But if you had some complicated statement that incidentally was a tautology, but wasn't obviously so, shouldn't the formalism not be able to assign a probability (edit: probability of value 1) to the statement?

I'm not trying to do the same thing as MIRI's Logical Inductors paper, where they have probabilities that slowly update as you do more calculations. In their formalism $P(10^{20}\text{th digit of }\pi\text{ is a }7)$ is close to $1/10$ to begin with, and then changes to $0$ or $1$ after you've done enough calculation to determine what the digit is.

I'm not worrying about how long it takes to do calculations, but instead assigning probabilities to formulas that are undecidable. For example given only that $$x$ and $y$ are elements of some set $S$ , it's undecidable whether or not $x=y$ . No amount of further calculation will tell you whether they're equal or not. Nevertheless I claim that we can say that given this state of knowledge it's correct to assign the formula ' $x=y$ ' a probability of $1/2$ .

Oscar Cunningham (Nov 23 2024 at 10:48):

My system will assign all theorems probability $1$ , all negations of theorems probability $0$ , and assign values between $0$ and $1$ to some undecidable statements. There will necessarily be other undecidable statements that can't be assigned a probability at all, otherwise we would be able to contravene Gödel's Theorems. (If we could algorithmically assign a coherent probability to every sentence then we could form a complete extension of $\sf{PA}$ by considering all statements in order and calculating their probability given our axioms, adding a new statement to our axioms whenever we found that its probability was greater than $0$ .) I don't think we'll ever be able to assign a probability to the statement that $V$ obeys the Axiom of Choice, given that $V$ is a model of $\sf{ZFC}$ .

Peva Blanchard (Nov 23 2024 at 11:42):

This is interesting. I wonder what is the class of statements to which your method can assign a probability value.

John Baez (Nov 23 2024 at 23:08):

Oscar wrote:

If I just want people to know about them then a blog post will serve equally well.

It's not really equally good, though it may be good enough. I put lots of ideas on blogs, and it seems rather few people notice them until I publish them or at least put them on the arXiv. In fact there are a couple of papers I want to write, to deal with this problem....

Mike Shulman (Nov 24 2024 at 07:52):

Another problem with blog posts is that they aren't subject to peer review, so even people who notice them won't have the same level of trust in them. So if you want people to cite and build on your work, I do think a traditional publication is more effective.

Morgan Rogers (he/him) (Nov 24 2024 at 15:01):

You don't have to write it by yourself either. Even if it doesn't advance your career, letting a grad student participate in the writing would be good for them.

John Baez (Nov 24 2024 at 18:40):

Indeed this project should be on the list of open problems we're talking about elsewhere. But a blog article could help publicize it.

Matteo Capucci (he/him) (Nov 26 2024 at 12:04):

Oscar Cunningham said:

Matteo Capucci (he/him) said:

Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?

Yes, except to be precise I wouldn't say that I was constructing a probability measure on the space of formulae, because the formulae aren't the individual elements of the measure space but rather correspond to its subsets, the 'events'. The thing that I'm putting the measure on is the set of models, or from a syntactic point of view the set of complete theories.

I see. So you're fixing a language $L$ (or maybe a theory $T$ ?), and then $P(\varphi)$ should correspond to the fraction of models $M \in {\bf Mod}(T)$ that satisfy $\varphi$ ?