You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
I've discovered an interesting way to extend first-order logic to assign 'probabilities' to logical formulas. For example we get the following values:
Has anything like this already been discovered?
The idea is to treat conditional probabilities as analogous to sequents . Irritatingly the conventions change the order of and . I also have a semantic notion which I'll write as by analogy to . One of the interesting things will be that these syntax and semantics agree with each other despite not initially seeming related.
The syntactic side of things uses the Principle of Indifference, which says that if we don't know anything to distinguish two possibilities, then we should assign to them the same probability. I interpret this formally by a rule that says that -equivalent sequents have the same probability.
This reminds of this paper (logical induction), but I never managed to find the time to read it thoroughly.
Also, Solomonoff tried to define "universal prior" in his theory of inductive inference
Ah yes, those are definitely related. A logical inductor also assigns probabilities to logical statements, but my understanding is that the probabilities it assigns to undecidable statements can vary depending on arbitrary implementation details, whereas the method I'm developing gives specific values to them. Likewise the exact values given by Solomonoff induction depend on the implementation.
The other principle that the syntax uses is that introduction of a new symbol via a definition should not change the probabilities. Like if you're playing a game and wondering what card your opponent has, then deciding to call the red cards 'foo' and the black cards 'bar' doesn't change the probability they have the seven of clubs.
Already this allows you to make some deductions.
Let's do a simple example in propositional logic. We want to calculate the probability that some proposition is true, given that we know nothing else about it. So we want to find
We can introduce a new proposition by defining it as . Since definitions don't change probabilities we have
Now since we have we can rewrite as . Likewise we can rewrite as the logically equivalent .
Now the right hand side has the form of a definition of , which we may remove:
Finally by -equivalence we have
Separately we have the law of probability
Substituting one into the other yields
from which it follows that .
(I hope I've left people wondering how stupid tricks like the above can lead to a proof that the probability that a surjection is also injective is . I'll write more later.)
So having seen that is just 1/2, you can probably guess what the semantics are for propositional logic. We just assign a fair coin flip to each atomic proposition and then take the usual probability. So for example , since out of the four possible outcomes for two coinflips, three have , and only two of those have .
Using similar arguments as above we can show that the syntactic reasoning gives the same answers, so we have an analogue of the completeness theorem for propositional logic.
Cool! When I see I think "groupoid cardinality".
The Machine Intelligence Research Institute (Yudkowsky's outfit) had a meeting on trying to assign probabilities to mathematical statements, and I blogged about it here and here. They did not arrive at specific formulas as you are doing. But if you publish something, it'd be worth citing some of their papers, and they might be interested in what you've done.
"groupoid cardinality" :smile: Good intuition!
Yeah, I grew up on LessWrong so I have a good idea of what they're interested in. I also remembered the Progic posts from the n-Category Cafe. It seems that Jon Williamson was also trying to extend logic to probabilities. My plan is to write some things up here as a sounding board and then recompile them more formally as something to link people to. I'm not in academia so I probably won't publish an actual paper.
Moving on to first-order logic, we can consider the example I gave in the first post.
We're being asked the probability that some variables are equal or not to each other. We'd like to use the Principle of Indifference like we did before. The Principle applies whenever our information is symmetric between two propositions. But we know that is an equivalence relation whereas is not, so we are not in fact indifferent between equality and its negation. We can't, for example, conclude that has the same probability as .
The trick is to forget temporarily what we know about equality, and then remember it again. People sometimes work in first-order logic without equality. Then if they decide they actually do want equality then they can just add a new symbol '' to their language, and add some axioms stating that it's an equivalence relation.
So if we write for probabilities without equality, we can imagine adding a binary relation with no particular properties. Then the Principle of Indifference does apply, and using a similar argument to what we did in propositional logic we would get that
and similarly if we have four variables we would find that all ways that they could behave with respect to were equally likely.
Then we add back the condition that is an equivalence relation on . We find that of those 1024 relations, only 15 are actually equivalence relations. And of those 15, only 3 satisfy . So we have returned to logic-with-equality, and we have
What about the semantic side?
How do we relate first-order logic to actual random variables?
We can imagine that if we actually knew the size of then things would be easier. Suppose has a finite cardinality , and that , , and are chosen from it uniformly at random. Then there's a chance that and an chance that . So overall our chance of would be .
If we had a probability distribution for we could then take a weighted average of over these probabilities to get our overall answer
What probability distribution should we choose? To represent our lack of information about , we want something like the 'uniform distribution over all cardinalities'. But sadly there is no uniform distribution on , never mind on all cardinalities.
The solution is to use the weighting from the groupoid cardinality. We let the probability of a set be inversely proportional to the size of its automorphism group. So in this case the weight of is . I won't explain why groupoid cardinality is a good idea, because there are already articles about it by Terry Tao, Qiaochu Yuan and John Baez. What hope do I have of explaining it better than the holy trinity? Suffice to say that it gives a natural distribution on finite sets which is in a categorical sense 'uniform' while still converging. (And indeed we could apply it to infinite sets. But their automorphism groups are uncountably infinite, so we'll say they have probability 0.)
To get the weights to behave as probabilities, we have to normalise them so that they sum to . Since , we get that the probability of is .
There's one last thing we need to worry about. The type declaration on the right hand side of our conditional probability. We are used to finding probabilities of propositions. But is not a proposition, it is a different kind of judgement: a typing judgement. This hasn't caused us problems so far because it has stayed safely on the right hand side of the conditional, where we are merely conditioning on it rather than actually calculating it's probability. But I want to apply the definition of conditional probability, which brings something from the right to the left of the conditional.
Taking a leaf out of the 'propositions as types' playbook, I'm going to say that is the expected number of ways to pick , , and from , i.e. the weighted average of . This can be greater than , which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions. Likewise will be the average number of ways to pick , , and such that is satisfied, i.e. .
Now we can do our calculation:
Finally we bring out our secret weapon: Dobiński's formula. This says that is . The Bell number is defined as the number of ways to partition a -element set. So here we have
Now you might think that I've taken a few dodgy steps. Either on the syntactic side, or the semantic side, or both. But isn't it interesting that the two completely different methods give exactly the same answer? The agreement between these two calculations is what convinced me that there was something significant to be found here.
(Tomorrow I'll write up assigning probabilities to some formulas with quantifiers in them.)
Oscar Cunningham said:
It seems that Jon Williamson was also trying to extend logic to probabilities.
Just to point out that Jon, an (ex)-colleague at Kent of many years, developed ideas on inductive logic (cf. his book).
This is fascinating stuff, @Oscar Cunningham. If you can check that these assignments of probabilities are 'coherent' in the usual sense (e.g. the probably of exhaustive and mutually exclusive events sums to 1), that would be really great. I think that any systematic and elegant way of coherently assigning probabilities to mathematical propositions is interesting - I wouldn't worry too much about what it "really means", or more precisely I hope such worries don't stop anyone from developing such a theory!
Even 'unnormalized probabilities' are interesting:
This can be greater than 1, which is unusual for a probability, but we are taking the probability of a whole new kind of judgement, so it's fair them to not behave exactly like propositions.
In physics we often use 'unnormalized probabilities', i.e. nonnegative numbers that needn't sum to 1. For example if a system has some finite set of states and the energy of the th state is , in thermal equilibrium at temperature the unnormalized probability of finding the system in the th state is
where is Boltzmann's constant. Unnormalized probabilities add and multiply the way probabilities do - they add for exhaustive and mutually exclusive events, and multiply for independent events. We extract probabilities from them only at the very end of our calculations, by normalizing them.
I've talked a bit to @Tobias Fritz about this: like me he wants to take these unnormalized probabilities quite seriously, in part because it's governed by the math of convex cones, which is better behaved (or at least simpler) than the math of convex sets.
Btw, I love Dobinski's formula and explained a proof here, where I used it to answer this puzzle:
Suppose raindrops are falling on your head, randomly and independently, at an average rate of one per minute. What’s the average of the cube of the number of raindrops that fall on your head in one minute?
I should have said that the strange greater-than- probabilities only appear when we take the probability of a type declaration. When we take the probability of a proposition it always comes out to between and , even if type declarations were used along the way. Like above when we worked out as
the numerator and denominator were and , both greater than , but their quotient was a sane . If is some proposition then the probability , calculated as , is always less than because the -structures satisfying form a subset of the set of all -structures. Because of this I don't think we ever have to do any normalization. We just have to get used to the fact that a probability conditional on a type judgement is calculated as the ratio of two expected values, rather than two mere probabilities.
Regarding a coherence proof, I hope that this will turn out to be trivial from the semantic point of view, since the values there represent actual probabilities. Then proving it from the syntactic point of view will be a matter of proving a completeness theorem saying that the syntactic and semantic probabilities are the same. The problem will be that from the semantic point of view there can be contexts where the expected values don't converge. Like
To prove a completeness theorem I will have to deal with these, or more likely find a way to say something useful despite not dealing with them. More on that later.
The other thing I should point out is that we can use the reciprocal of the number of automorphisms to work out the probabilities of any structure, not just sets. Suppose we have a structure on some set with size . Then any permutation of acts on to give us another structure on , isomorphic to . But if the permutation is actually an automorphism of then this structure will just be again. So the total number of structures isomorphic to is . This multiplies with the factor we have on sets to give .
For example, consider sets equipped with a predicate: . One way to weigh these would be to consider that a set of size has possible predicates on it, so the total weight of them over all sets would be . The other way to weigh them would be to say that up to isomorphism a set with a predicate is determined by the size of the set and the number of elements for which the predicate is true. An automorphism of such a set is given by a permutation of the true elements and a permutation of the false elements. So the total weight is , which one can prove using the binomial theorem is also .
An important consequence of this is that the probabilities don't depend on how we describe a given structure. Two equivalent definitions will have the same number of automorphisms and hence the same probabilities. For example we could define a group in terms of just its multiplication operation, satisfying a bunch of axioms including the existence of an identity. Or we could define a group in terms of sets equipped with multiplication, inversion and an identity. Either way a given group will have the same automorphism group and hence the same probability. We can use this to easily calculate that a group of size has a probability of being abelian, because has automorphisms and only has .
I said I'd say something about formulas with quantifiers in them. The simplest such formula is . This is true only when is empty. From the semantic point of view the probability is easy to calculate: the distribution gives the probability , so .
From the syntactic point of view things are harder. The first thing to notice is that is transcendental. So it will be impossible to derive a probability of by starting with simple axioms and taking finitely many steps ('simple' meaning not mentioning directly). I will eventually deal with this, but first let's focus on something easier. I will show that the probability of a set having size is times less than the probability that a set has size . Again we will use the principle of indifference and the principle that definitions do not change probabilities.
We want to calculate . The expression '' is shorthand for some formula equivalent to the statement that has elements. (Usually I like to write for the size of a set, but it's confusing with the vertical bars also being used for conditionals.) We begin by defining a new set as the coproduct :
Under this context, is equivalent to .
Then we note that this is equivalent to a definition of as the subset of containing the elements not equal to .
Note that this has left us with an '' on the right hand side. Bayes' Theorem lets us bring it to the left.
The expected size of a set with cardinality is , so we have . The denominator , is harder to deal with, so for now let us abbreviate this factor as . So we have derived
By a recursion to we have:
A Poisson distribution!
(Next I will explain how to deal with the fact that is transcendental, and how to fix as .)
Very cool stuff! I suspect what you're doing is constructing a measure/probability measure on the space of formulae for a given language, such that it respects congruence and basic substitution rules?
The fact such spaces are 'finitely generated' in some sense makes it possible to build this measure inductively, sort of.
The fact you can wing higher-order stuff with groupoid cardinality is really fascinating