You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
I've discovered an interesting way to extend first-order logic to assign 'probabilities' to logical formulas. For example we get the following values:
Has anything like this already been discovered?
The idea is to treat conditional probabilities as analogous to sequents . Irritatingly the conventions change the order of and . I also have a semantic notion which I'll write as by analogy to . One of the interesting things will be that these syntax and semantics agree with each other despite not initially seeming related.
The syntactic side of things uses the Principle of Indifference, which says that if we don't know anything to distinguish two possibilities, then we should assign to them the same probability. I interpret this formally by a rule that says that -equivalent sequents have the same probability.
This reminds of this paper (logical induction), but I never managed to find the time to read it thoroughly.
Also, Solomonoff tried to define "universal prior" in his theory of inductive inference
Ah yes, those are definitely related. A logical inductor also assigns probabilities to logical statements, but my understanding is that the probabilities it assigns to undecidable statements can vary depending on arbitrary implementation details, whereas the method I'm developing gives specific values to them. Likewise the exact values given by Solomonoff induction depend on the implementation.
The other principle that the syntax uses is that introduction of a new symbol via a definition should not change the probabilities. Like if you're playing a game and wondering what card your opponent has, then deciding to call the red cards 'foo' and the black cards 'bar' doesn't change the probability they have the seven of clubs.
Already this allows you to make some deductions.
Let's do a simple example in propositional logic. We want to calculate the probability that some proposition is true, given that we know nothing else about it. So we want to find
We can introduce a new proposition by defining it as . Since definitions don't change probabilities we have
Now since we have we can rewrite as . Likewise we can rewrite as the logically equivalent .
Now the right hand side has the form of a definition of , which we may remove:
Finally by -equivalence we have
Separately we have the law of probability
Substituting one into the other yields
from which it follows that .
(I hope I've left people wondering how stupid tricks like the above can lead to a proof that the probability that a surjection is also injective is . I'll write more later.)
So having seen that is just 1/2, you can probably guess what the semantics are for propositional logic. We just assign a fair coin flip to each atomic proposition and then take the usual probability. So for example , since out of the four possible outcomes for two coinflips, three have , and only two of those have .
Using similar arguments as above we can show that the syntactic reasoning gives the same answers, so we have an analogue of the completeness theorem for propositional logic.
Cool! When I see I think "groupoid cardinality".
The Machine Intelligence Research Institute (Yudkowsky's outfit) had a meeting on trying to assign probabilities to mathematical statements, and I blogged about it here and here. They did not arrive at specific formulas as you are doing. But if you publish something, it'd be worth citing some of their papers, and they might be interested in what you've done.
"groupoid cardinality" :smile: Good intuition!
Yeah, I grew up on LessWrong so I have a good idea of what they're interested in. I also remembered the Progic posts from the n-Category Cafe. It seems that Jon Williamson was also trying to extend logic to probabilities. My plan is to write some things up here as a sounding board and then recompile them more formally as something to link people to. I'm not in academia so I probably won't publish an actual paper.
Moving on to first-order logic, we can consider the example I gave in the first post.
We're being asked the probability that some variables are equal or not to each other. We'd like to use the Principle of Indifference like we did before. The Principle applies whenever our information is symmetric between two propositions. But we know that is an equivalence relation whereas is not, so we are not in fact indifferent between equality and its negation. We can't, for example, conclude that has the same probability as .
The trick is to forget temporarily what we know about equality, and then remember it again. People sometimes work in first-order logic without equality. Then if they decide they actually do want equality then they can just add a new symbol '' to their language, and add some axioms stating that it's an equivalence relation.
So if we write for probabilities without equality, we can imagine adding a binary relation with no particular properties. Then the Principle of Indifference does apply, and using a similar argument to what we did in propositional logic we would get that
and similarly if we have four variables we would find that all ways that they could behave with respect to were equally likely.
Then we add back the condition that is an equivalence relation on . We find that of those 1024 relations, only 15 are actually equivalence relations. And of those 15, only 3 satisfy . So we have returned to logic-with-equality, and we have
What about the semantic side?
How do we relate first-order logic to actual random variables?
We can imagine that if we actually knew the size of then things would be easier. Suppose has a finite cardinality , and that , , and are chosen from it uniformly at random. Then there's a chance that and an chance that . So overall our chance of would be .
If we had a probability distribution for we could then take a weighted average of over these probabilities to get our overall answer
What probability distribution should we choose? To represent our lack of information about , we want something like the 'uniform distribution over all cardinalities'. But sadly there is no uniform distribution on , never mind on all cardinalities.
The solution is to use the weighting from the groupoid cardinality. We let the probability of a set be inversely proportional to the size of its automorphism group. So in this case the weight of is . I won't explain why groupoid cardinality is a good idea, because there are already articles about it by Terry Tao, Qiaochu Yuan and John Baez. What hope do I have of explaining it better than the holy trinity? Suffice to say that it gives a natural distribution on finite sets which is in a categorical sense 'uniform' while still converging. (And indeed we could apply it to infinite sets. But their automorphism groups are uncountably infinite, so we'll say they have probability 0.)
To get the weights to behave as probabilities, we have to normalise them so that they sum to . Since , we get that the probability of is .
There's one last thing we need to worry about. The type declaration on the right hand side of our conditional probability. We are used to finding probabilities of propositions. But is not a proposition, it is a different kind of judgement: a typing judgement. This hasn't caused us problems so far because it has stayed safely on the right hand side of the conditional, where we are merely conditioning on it rather than actually calculating it's probability. But I want to apply the definition of conditional probability, which brings something from the right to the left of the conditional.
Taking a leaf out of the 'propositions as types' playbook, I'm going to say that is the expected number of ways to pick , , and from , i.e. the weighted average of . This can be greater than , which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions. Likewise will be the average number of ways to pick , , and such that is satisfied, i.e. .
Now we can do our calculation:
Finally we bring out our secret weapon: Dobiński's formula. This says that is . The Bell number is defined as the number of ways to partition a -element set. So here we have
Now you might think that I've taken a few dodgy steps. Either on the syntactic side, or the semantic side, or both. But isn't it interesting that the two completely different methods give exactly the same answer? The agreement between these two calculations is what convinced me that there was something significant to be found here.
(Tomorrow I'll write up assigning probabilities to some formulas with quantifiers in them.)
Oscar Cunningham said:
It seems that Jon Williamson was also trying to extend logic to probabilities.
Just to point out that Jon, an (ex)-colleague at Kent of many years, developed ideas on inductive logic (cf. his book).
This is fascinating stuff, @Oscar Cunningham. If you can check that these assignments of probabilities are 'coherent' in the usual sense (e.g. the probably of exhaustive and mutually exclusive events sums to 1), that would be really great. I think that any systematic and elegant way of coherently assigning probabilities to mathematical propositions is interesting - I wouldn't worry too much about what it "really means", or more precisely I hope such worries don't stop anyone from developing such a theory!
Even 'unnormalized probabilities' are interesting:
This can be greater than 1, which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions.
In physics we often use 'unnormalized probabilities', i.e. nonnegative numbers that needn't sum to 1. For example if a system has some finite set of states and the energy of the th state is , in thermal equilibrium at temperature the unnormalized probability of finding the system in the th state is
where is Boltzmann's constant. Unnormalized probabilities add and multiply the way probabilities do - they add for exhaustive and mutually exclusive events, and multiply for independent events. We extract probabilities from them only at the very end of our calculations, by normalizing them.
I've talked a bit to @Tobias Fritz about this: like me he wants to take these unnormalized probabilities quite seriously, in part because it's governed by the math of convex cones, which is better behaved (or at least simpler) than the math of convex sets.
Btw, I love Dobinski's formula and explained a proof here, where I used it to answer this puzzle:
Suppose raindrops are falling on your head, randomly and independently, at an average rate of one per minute. What’s the average of the cube of the number of raindrops that fall on your head in one minute?
I should have said that the strange greater-than- probabilities only appear when we take the probability of a type declaration. When we take the probability of a proposition it always comes out to between and , even if type declarations were used along the way. Like above when we worked out as
the numerator and denominator were and , both greater than , but their quotient was a sane . If is some proposition then the probability , calculated as , is always less than because the -structures satisfying form a subset of the set of all -structures. Because of this I don't think we ever have to do any normalization. We just have to get used to the fact that a probability conditional on a type judgement is calculated as the ratio of two expected values, rather than two mere probabilities.
Regarding a coherence proof, I hope that this will turn out to be trivial from the semantic point of view, since the values there represent actual probabilities. Then proving it from the syntactic point of view will be a matter of proving a completeness theorem saying that the syntactic and semantic probabilities are the same. The problem will be that from the semantic point of view there can be contexts where the expected values don't converge. Like
To prove a completeness theorem I will have to deal with these, or more likely find a way to say something useful despite not dealing with them. More on that later.
The other thing I should point out is that we can use the reciprocal of the number of automorphisms to work out the probabilities of any structure, not just sets. Suppose we have a structure on some set with size . Then any permutation of acts on to give us another structure on , isomorphic to . But if the permutation is actually an automorphism of then this structure will just be again. So the total number of structures isomorphic to is . This multiplies with the factor we have on sets to give .
For example, consider sets equipped with a predicate: . One way to weigh these would be to consider that a set of size has possible predicates on it, so the total weight of them over all sets would be . The other way to weigh them would be to say that up to isomorphism a set with a predicate is determined by the size of the set and the number of elements for which the predicate is true. An automorphism of such a set is given by a permutation of the true elements and a permutation of the false elements. So the total weight is , which one can prove using the binomial theorem is also .
An important consequence of this is that the probabilities don't depend on how we describe a given structure. Two equivalent definitions will have the same number of automorphisms and hence the same probabilities. For example we could define a group in terms of just its multiplication operation, satisfying a bunch of axioms including the existence of an identity. Or we could define a group in terms of sets equipped with multiplication, inversion and an identity. Either way a given group will have the same automorphism group and hence the same probability. We can use this to easily calculate that a group of size has a probability of being abelian, because has automorphisms and only has .
I said I'd say something about formulas with quantifiers in them. The simplest such formula is . This is true only when is empty. From the semantic point of view the probability is easy to calculate: the distribution gives the probability , so .
From the syntactic point of view things are harder. The first thing to notice is that is transcendental. So it will be impossible to derive a probability of by starting with simple axioms and taking finitely many steps ('simple' meaning not mentioning directly). I will eventually deal with this, but first let's focus on something easier. I will show that the probability of a set having size is times less than the probability that a set has size . Again we will use the principle of indifference and the principle that definitions do not change probabilities.
We want to calculate . The expression '' is shorthand for some formula equivalent to the statement that has elements. (Usually I like to write for the size of a set, but it's confusing with the vertical bars also being used for conditionals.) We begin by defining a new set as the coproduct :
Under this context, is equivalent to .
Then we note that this is equivalent to a definition of as the subset of containing the elements not equal to .
Note that this has left us with an '' on the right hand side. Bayes' Theorem lets us bring it to the left.
The expected size of a set with cardinality is , so we have . The denominator , is harder to deal with, so for now let us abbreviate this factor as . So we have derived
By a recursion to we have:
A Poisson distribution!
(Next I will explain how to deal with the fact that is transcendental, and how to fix as .)
Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?
The fact such spaces are 'finitely generated' in some sense makes it possible to build this measure inductively, sort of.
The fact you can wing higher-order stuff with groupoid cardinality is really fascinating
Matteo Capucci (he/him) said:
Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?
This is what he's aiming to do. As mentioned, the AI crowd at the Machine Intelligence Research Institute have written papers about such probability measures. The basic idea:
How should a rational agent assign probabilities to statements in mathematics? Of course an omniscient being could assign
• probability 1 to every mathematical statement that’s provable,
• probability 0 to every statement whose negation is provable,
and
• probability ??? to every statement that is neither provable nor disprovable.
But a real-world rational agent will never have time to check all proofs, so there will always be lots of statements it’s not sure about. Actual mathematicians always have conjectures, like the Twin Prime Conjecture, that we consider plausible even though nobody has proved them. And whenever we do research, we’re constantly estimating how likely it is for statements to be true, and changing our estimates as new evidence comes in. In other words, we use scientific induction in mathematics.
How could we automate this? Most of us don’t consciously assign numerical probabilities to mathematical statements. But maybe an AI mathematician should. If so, what rules should it follow?
It's possible that Oscar's work could help here, by providing 'prior probabilities' that we could then update. (I'm actually not sure it does help, but it's a thought.)
Oscar Cunningham said:
I'm not in academia so I probably won't publish an actual paper.
I don't see why you shouldn't ;)
I was going to say that too. If this stuff actually works, and Oscar can either prove it works or assemble enough evidence to make a plausible conjecture, it really deserves a paper. This stuff looks cool!
I can't tell whether he was saying "since I'm not in academia I don't have time to go through the rigamarole of publishing a paper" or "since I'm not in academia probably no decent journal would publish a paper if I wrote one" - or both. If the latter, I hope that's not really true.
Do probability 1 statements necessarily correspond to tautologies @Oscar Cunningham ? If so, that sounds like a solid foundation for some kind of "statistical proof theory"
But if you had some complicated statement that incidentally was a tautology, but wasn't obviously so, shouldn't the formalism not be able to assign a probability (edit: probability of value 1) to the statement?
Like in their example , shouldn't some tautology be a possible value of , and this theorem then abstracts over this possibility?
Maybe I'm confused, is the idea that a statement has probability one after you establish its truth as a theorem, which tautologies are supposed to encode? I'm not very good at logic.
I really like the idea of preserving probability under equivalence and definition introductions.
Given the soundness and completeness of classical logic, 'theorems' are the same as 'tautologies'. So don't worry about the difference too much now, though it takes work to prove they're the same.
The hard problem is getting a reasonable way of assigning probabilities to statements that doesn't assign probability 1 to every theorem (i.e. every tautology). How do you decide which statements are tautologies 'but not obviously so'?
I'm not saying this is unsolvable problem - far from it! I just haven't seen anyone solve it.
I don't think there should be a unique way to assign probabilities to statements. The way we assign probabilities can depend on our current knowledge. To turn this into math, we'd have to formalize the idea of a collection of 'known' statements. Maybe we should allow this to be an arbitrary set of statements. Once you start demanding that this set be closed under deduction rules, it tends to become infinite.
In simple terms: if we assume that you know all the consequences of what you know, then you know an infinite amount of stuff. So that's not realistic.
But Oscar's ideas are interesting because they seem to avoid the issue of formalizing what's 'known' or 'obvious'.
Yeah that makes sense to me.
It seems like a solution would be like assigning a (probability) mass to all the points of a space, and it turns out the "space" in this context are sentences of some formal language - that necessarily includes sentences we have reason to believe are true. Which if it's first order logic, would include all the theorems.
You could know a theorem was true, but maybe you wanted to use this formalism to express a student's current understanding, so you'd still want to be able to assign a probability less than one to the theorem the student didn't understand yet. Then you could try to reason out what the student would think was likely, with the usual laws of probability.
Been busy for a couple of weeks, but I've got some time now.
Matteo Capucci (he/him) said:
Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?
Yes, except to be precise I wouldn't say that I was constructing a probability measure on the space of formulae, because the formulae aren't the individual elements of the measure space but rather correspond to its subsets, the 'events'. The thing that I'm putting the measure on is the set of models, or from a syntactic point of view the set of complete theories.
Morgan Rogers (he/him) said:
Oscar Cunningham said:
I'm not in academia so I probably won't publish an actual paper.
I don't see why you shouldn't ;)
John Baez said:
I was going to say that too. If this stuff actually works, and Oscar can either prove it works or assemble enough evidence to make a plausible conjecture, it really deserves a paper. This stuff looks cool!
I can't tell whether he was saying "since I'm not in academia I don't have time to go through the rigamarole of publishing a paper" or "since I'm not in academia probably no decent journal would publish a paper if I wrote one" - or both. If the latter, I hope that's not really true.
It's a combination of these things. I certainly don't have much free time, as you can tell from the fact it's been two weeks since I last posted on here. The other factor is incentives. Since I'm not an academic there's not necessarily a professional advantage to having these ideas published in a paper. If I just want people to know about them then a blog post will serve equally well. It might be fun to have a paper, but I'm not sure it's worth the additional hours of chasing down references, battling with LaTeX, responding to reviewers, etc..
Morgan Rogers (he/him) said:
Do probability 1 statements necessarily correspond to tautologies Oscar Cunningham ? If so, that sounds like a solid foundation for some kind of "statistical proof theory"
That would be nice, but at the moment I don't think it will work that way. It's like when I throw a dart at a dartboard, any singleton set has area , so the event 'hit the exact centre' has probability , but it's not literally impossible to hit the centre. I have to hit some point. Likewise we might find formulae that have probability but which aren't tautologies. Definitely all tautologies should have probability .
Alex Kreitzberg said:
But if you had some complicated statement that incidentally was a tautology, but wasn't obviously so, shouldn't the formalism not be able to assign a probability (edit: probability of value 1) to the statement?
I'm not trying to do the same thing as MIRI's Logical Inductors paper, where they have probabilities that slowly update as you do more calculations. In their formalism is close to to begin with, and then changes to or after you've done enough calculation to determine what the digit is.
I'm not worrying about how long it takes to do calculations, but instead assigning probabilities to formulas that are undecidable. For example given only that $$x$ and are elements of some set , it's undecidable whether or not . No amount of further calculation will tell you whether they're equal or not. Nevertheless I claim that we can say that given this state of knowledge it's correct to assign the formula '' a probability of .
My system will assign all theorems probability , all negations of theorems probability , and assign values between and to some undecidable statements. There will necessarily be other undecidable statements that can't be assigned a probability at all, otherwise we would be able to contravene Gödel's Theorems. (If we could algorithmically assign a coherent probability to every sentence then we could form a complete extension of by considering all statements in order and calculating their probability given our axioms, adding a new statement to our axioms whenever we found that its probability was greater than .) I don't think we'll ever be able to assign a probability to the statement that obeys the Axiom of Choice, given that is a model of .
This is interesting. I wonder what is the class of statements to which your method can assign a probability value.
Oscar wrote:
If I just want people to know about them then a blog post will serve equally well.
It's not really equally good, though it may be good enough. I put lots of ideas on blogs, and it seems rather few people notice them until I publish them or at least put them on the arXiv. In fact there are a couple of papers I want to write, to deal with this problem....
Another problem with blog posts is that they aren't subject to peer review, so even people who notice them won't have the same level of trust in them. So if you want people to cite and build on your work, I do think a traditional publication is more effective.
You don't have to write it by yourself either. Even if it doesn't advance your career, letting a grad student participate in the writing would be good for them.
Indeed this project should be on the list of open problems we're talking about elsewhere. But a blog article could help publicize it.
Oscar Cunningham said:
Matteo Capucci (he/him) said:
Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?
Yes, except to be precise I wouldn't say that I was constructing a probability measure on the space of formulae, because the formulae aren't the individual elements of the measure space but rather correspond to its subsets, the 'events'. The thing that I'm putting the measure on is the set of models, or from a syntactic point of view the set of complete theories.
I see. So you're fixing a language (or maybe a theory ?), and then should correspond to the fraction of models that satisfy ?