You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
I've been talking a bit with Hong Qian at the University of Washington. He works on nonequilibrium thermodynamics and biophysics... not particularly using categories, but he's full of cool ideas I'd like to understand and make more mathematical.
We're thinking of maybe having a regular series of conversations. Would you be interested in attending those, Owen? I think I'd have them instead of our usual meetings instead of on top of them, for a while, since I'm getting involved in too many meetings. Maybe we could alternate between meeting with just you and me, and meetings with him (and some other folks).
Here's an example of the paper he's written, which I'd like to understand:
Abstract. We generalize the convex duality symmetry in Gibbs' statistical ensemble formulation, between Massieu's free entropy and the Gibbs entropy as a function of mean internal energy U. The duality tells us that Gibbs thermodynamic entropy is to the law of large numbers (LLN) for arithmetic sample means what Shannon's information entropy is to the LLN for empirical counting frequencies. Following the same logic, we identify U as the conjugate variable to counting frequency, a Hamilton-Jacobi equation for Shannon entropy as an equation of state, and suggest an eigenvalue problem for modeling statistical frequencies of correlated data.
I don't know what all this means, but it uses a lot of the words I'm thinking about these days. :upside_down:
Yes I definitely would be interested! I've heard about Hong Qian through several different sources, and I've always been interested in learning more about the stuff he is doing. I think specifically he's mentioned in Haddad.
I think alternating sounds like a good idea.
Actually, I just did a youtube video (not so well done...) explaining Shannon entropy to my friend who is a chemist by using this derivation of Shannon entropy from empirical counting frequencies.
(not from the paper though)
OK, I'm starting to get a hang of what's going on in this paper, and it's making a lot of sense so far.
The basic setup is that you have some summary statistic of a collection of i.i.d. variables, and you are interested in the asymptotic distribution of this summary statistic as . The large deviations principle says that this distribution should look like , and the claim is that we should be thinking about as a "generalized entropy".
In the case that the summary statistic is the frequency distribution of discrete variables, then the entropy is the relative entropy with respect to the distribution of the i.i.d. variables. In the case that the summary statistic is the mean value, then the entropy is the Gibbs entropy.
As , you are only ever going to see values of the summary statistic that are at minima of . Why minima, when we typically maximize entropy? Well, in the first case it's because we are minimizing relative entropy. In the second case, it's because there's a sign flip.
In the paper, they state the large deviations principle as both and , so... :shrug:
I like this a lot, because it gives a justification for entropy maximization that is rooted in probability, and explicitly tied to the fact that we are looking at systems with many independent degrees of freedom.
It also gives us a way of making an entropy function for any observable. Namely, use the large deviations principle for mean values of that observable over many independent samplings.
The empirical frequency is a special case of this with the observable that sends state to the th basis vector in , where the variable can take on any of states.
This also explains Shannon entropy as being a kind of "initial" entropy, because the mean values of other observables can be calculated from the empirical frequency distribution.
But this starts to get interesting when we consider situations where the samples are not i.i.d., but we can still express a large deviations principle. For instance, the mean magnetization in the Ising model.
Ah, I now see that they talk about statistics on a Markov chain! Very exciting!
Owen Lynch said:
It also gives us a way of making an entropy function for any observable. Namely, use the large deviations principle for mean values of that observable over many independent samplings.
Cool stuff! Is this entropy function convex? What's the domain of this entropy function, anyway?
The domain of the entropy function is the codomain of the statistic.
So, @Owen Lynch and I are trying to understand this paper:
but so far only up to around equation (6).
The first two big things I didn't understand were allusions to Cramer’s theorem and Sanov's theorem.
Luckily Wikipedia explains them both; they're both really great theorems.
Cramer's theorem starts by assuming you have a function on a probability measure space, and directs your attention to the function
where means 'expected value'.
They call this the cumulant generating function because if you expand this as a Taylor series in the coefficients are called the cumulants of .
Unfortunately if you have no feeling for cumulants, this is the simplest definition of 'cumulant'.
But you can re-express the cumulants in terms of the moments of , namely
Also, cumulants have a bunch of nice properties.
John Baez said:
Cramer's theorem starts by assuming you have a function on a probability measure space, and directs your attention to the function
we normally call a variable :)
As for me, I'd prefer to think like a physicist and call the Hamiltonian. Then
is famous: it's the partition function. We usually write , which stands for coolness, i.e. inverse temperature.
Whoa, I never thought about Cramers theorem like that!
Then
is also famous; it's almost the free energy of our system.
Actually the Helmholtz free energy is
where again (coolness), and (the reciprocal of temperature in units where Boltzmann's constant is 1).
Owen Lynch said:
John Baez said:
Cramer's theorem starts by assuming you have a function on a probability measure space, and directs your attention to the function
we normally call a variable :)
Yes, it's a 'random variable', which is a function on a probability measure space. Since I'm going for a physics interpretation I might call it an 'observable'.
But how about 'function', since that's all it really is. :upside_down:
Okay, now let me try to state Cramer's theorem, which I'd never known about before. But I'll state it using some physics language just to keep Owen entertained.
So I'll call
the log of the partition function where
is the partition function and remember that is called the free energy (Helmholtz free energy).
Now, Cramer's theorem starts by considering something wacky-sounding, the Legendre transform of the log of the partition function:
With luck this will turn out not to be so wacky; I'm hoping it's something somewhat familiar in thermodynamics! Maybe something like the entropy as a function of energy? I need to calculate a bit.
But anyway, Cramer's theorem says
where are independent identically distributed random variables, all distributed just like the Hamiltonian .
I'm afraid I may be getting some signs wrong due to my Hamiltonian being .
For example, I'd feel happier with
because this would be looking at the probability of measuring the energy times and getting more than . Since it's usually improbable for a system to have very large energy in statistical mechanics (the probability drops off exponentially), this would smell like a "large deviations" result, which I think is what we're shooting for.
But if everything works out, it seems maybe Cramer's theorem is relating the entropy of the state of thermodynamic equilibrium at energy to the probability that repeated measurements of the energy give a result more than on average.
That's where I am in understanding Cramer's theorem.
I found these notes on the "Cramér transform" useful for understanding some of this stuff and its relationship to convex analysis. (It's an appendix to the book Linear and Integer Programming vs Linear Integration and Counting by Laserre.) It presents Cramér's theorem as a transform that turns convolutions into infimal convolutions, which seems like a useful perspective.
(But re-examining it, I was mixing up "Cramér's theorem" with this "Cramér transform" and those notes don't really talk about large deviations, so it's maybe less immediately relevant than I thought, sorry. But they may be useful for seeing how it connects to the convex function stuff in your paper.)
Quick check: the free energy is , so if , then we have .
Then
If we instead identify with the free energy, then we get , so I think maybe that's the right sign convention?
Thanks! We'll eventually figure it out. The trick with these things is to get a nice solid idea, then the minus signs and factors of 2 are bound to fall in line if you keep working at it.
Wikipedia says free energy is , with a minus sign, and I did this calculation myself on page 12 of Relative entropy in biological systems.
You can just calculate and show
which is free energy. Here is the expected value of in the Gibbs state at temperature - that's what you're calling .
Agh I've just stared at this for like 10 minutes trying to spot my sign error, and I can't figure out what has gone wrong! Probably it will be more obvious in the morning...
Yeah, luckily a lot of sign errors go away after a night's sleep.
Here's something odd! In Hong's paper, he uses a different sign for the Legendre transform!
He uses a where we have a , and where there is a in the statement of Cramer's theorem!
This is all highly suspect....
Using a here does get entropy out, but it's not a legendre fenchel transform that I'm familiar with!!
He even uses this definition for the legendre fenchel transform later on!!
This is when he's talking about large deviations
Note that he also uses an inf instead of a sup, as is used in Kramer's theorem
What we end up getting here is that the infimum happens at . Which actually may be the right thing to do here; let me quickly recalculate what the derivative of the log of the partition function is.
Aha! It's because in Cramer's theorem, they define the log partition function as , whereas the log partition function in statistical mechanics is !
So it makes sense to take the legendre transform with respect to "", as it were
OK, everything is fixed and makes sense now
Can you write up a statement of the key result, using conventions that match the usual conventions in thermodynamics / statistical mechanics? I was trying to do that myself. Something like:
If you start with a function on a measure space, you can define the partition function
and then take its logarithm, and then....
[fill in details here]
... take the Legendre transform, and then....
[fill in details here]
.... and finally you get a function such that
(or is it ?)
Could you please fill in the details, getting all the signs right and all the sups and infs right and the versus right?
Sure!
Thanks!
Maybe you could write it here.
Oh, I'm writing up a document that I'm going to put as a pdf here
I just finished handwriting it, so now I just have to type it up
Okay, that's probably more useful in the long run.
Alright, I wrote up everything purely pertaining to Cramer's theorem, and then started moving on to writing up about why the log of the partition function is related to free energy, but I can't see the word "partition function" one more time today, so finishing that bit will have to wait. The first part of this document is complete though, and answers your question. cramers_theorem.pdf
Great! I'm checking it out. I'm glad you stated a "thermodynamic version" in its own box.
I know why and how the log of the partition function relates to free energy, but it'll be good to carry it forward to the point of giving a nice thermodynamic interpretation of the function .
Hi! Could you finish up this document so we can talk about it on Monday?
I'll try my best; I realized that what I was trying to was actually to prove something that is false (I think)!
I.e., I think the thermodynamic entropy defined by does not end up equaling the Shannon entropy of the canonical ensemble
So that has shaken my understanding of things a bit, and I need to now understand why this thermodynamic entropy is important
Ah, I may have found part of the problem
I think I was using the "state entropy", which is the entropy of the random variable that determines which state one is in. However, different states might have the same energy, so the energy variable has a low entropy than the state variable. I'm going to try redoing my calculations using the entropy of the energy variable instead.
No, that wasn't the problem, I had just made a dumb mistake somewhere else
It all checks out now
Great, I'll read it now and we can talk about it in 22 minutes.
Wait, read this version! cramers_theorem.pdf
Slightly more stuff :D
Okay. In the version I just read, you don't finish the job and say what the thermodynamic meaning of is.
I'll try the new one!
Note: I changed the notation to be in line with Hong Qian's paper, because I was getting confused going between the two
Okay. Now you say , or really , is minus the entropy (under some conditions).
Yes!