Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: learning: questions

Topic: Relative entropy variable contributions

Kalan Kucera (Aug 11 2024 at 19:47):

I was wondering wrt relative entropies between objects of a category, could you define specific contributions to the overall relative entropy measure due to a particular variable?

Let's say that I have a category of models, with three models p, q, and r in it. I take 'r' to be a reference model, and then can look at the relative entropy between the other models and it, K(p||r) and K(q||r). If, for instance, both probability distributions created by models p and q had an unspecified, but seemingly measurable via K, dependence on some variable x of the model, would it be valid to to:
1) directly compare the relative entropy measures for the two models?
2) create a function (K/x) to quantify the variable's contribution to shifting the models q and p away from the reference r?

JR (Aug 11 2024 at 21:22):

Kalan Kucera said:

I was wondering wrt relative entropies between objects of a category, could you define specific contributions to the overall relative entropy measure due to a particular variable?

Let's say that I have a category of models, with three models p, q, and r in it. I take 'r' to be a reference model, and then can look at the relative entropy between the other models and it, K(p||r) and K(q||r). If, for instance, both probability distributions created by models p and q had an unspecified, but seemingly measurable via K, dependence on some variable x of the model, would it be valid to to:
1) directly compare the relative entropy measures for the two models?
2) create a function (K/x) to quantify the variable's contribution to shifting the models q and p away from the reference r?

Perhaps the Jensen-Shannon divergence of a uniform mixture of p, q, and r

Kalan Kucera (Aug 11 2024 at 21:38):

JR said:

Kalan Kucera said:

I was wondering wrt relative entropies between objects of a category, could you define specific contributions to the overall relative entropy measure due to a particular variable?

Let's say that I have a category of models, with three models p, q, and r in it. I take 'r' to be a reference model, and then can look at the relative entropy between the other models and it, K(p||r) and K(q||r). If, for instance, both probability distributions created by models p and q had an unspecified, but seemingly measurable via K, dependence on some variable x of the model, would it be valid to to:
1) directly compare the relative entropy measures for the two models?
2) create a function (K/x) to quantify the variable's contribution to shifting the models q and p away from the reference r?

Perhaps the Jensen-Shannon divergence of a uniform mixture of p, q, and r

In that scenario, would I just vary x and then the JSD would tell me when the variance of x had equivalent impact on the distributions of p and q relative to r?

Eric M Downes (Aug 11 2024 at 21:56):

Second the Jensen-Shannon divergence.

It sounds like you are working in ordinary probability / statistics ("models", "variance", "distributions", etc.). If you need to work in a weaker setting, can you specify which category you are working in? Or paper(s) you're working from?

Does your category have an equivalent for Bayes' Law and marginal distributions? If so, say $p_X,q_X$ are two joint distributions over a product of variables $X=x\times y\times z\ldots$ ), their marginals over just $x$ are $p_x,q_x$ , and the distributions conditioned on $x$ are $p_{\ldots|x},q_{\ldots|x}$ then
$K(p_X||q_X)\leq K(p_x||q_x)+K(p_{\ldots|x}||q_{\ldots|x})$
likely holds, which may be helpful.

Kalan Kucera (Aug 11 2024 at 22:14):

I think this will do the trick, thank you both for the suggestion. I am trying to do apply some of this analysis to a category I'm attempting to construct for some physical measurements (if you're familiar with creep in metals?) on the basis of the random distributions of test results. There are two different sets of data for these tests, there are the raw results from the characterization method, and the results determined by the model of the test, and I'm looking at the empirical data sets (where the only way affective variables arise are through applied constants and curve fitting) relative to the model results (variables applied in roughly the same form found from curve fitting). A kind of "relative entropy minimization" I suppose.

I want to say that because, in experiment, we know there is an effect from temperature, say, we could create some variable that is a portion of the relative entropy produced by variation in temperature based on the difference in KL.

What I'm hearing from y'all is that, for two different models, if the temperature had the same effect relative to the model, their JSD should be... 0?

Eric M Downes (Aug 11 2024 at 22:24):

Yeah I would definitely recommend just working in ordinary probability theory then. (Also see above my correction to an inequality.)

Tai Danae Bradley does some good stuff with how to isolate variables without losing information (as one usually does with taking a marginal) in her thesis -- its categorical but very readable and since you're doing physics the quantum mechanical analogies she makes should hopefully be pretty familiar to you.

Eric M Downes (Aug 11 2024 at 22:37):

What I'm hearing from y'all is that, for two different models, if the temperature had the same effect relative to the model, their JSD should be... 0?

Unfortunately I'm still a bit un certain of what you're actually doing, and I don't want to guess.

The intuitive explanation of whats going on is that relative entropy $K(p||q)$ can be thought of as a distance on a manifold (like a curved surface). But that manifold is different than the one on which $K(q||p)$ measures distances. Thus $K$ fails to be a metric (not symmetric). The JSD forcibly symmetrizes this so that you end up with $JSD(p||q)=JSD(q||p)$ , and thus a metric. It necessarily loses some subtlety in doing so, however...

I expect what you're doing can probably be expressed just by looking at covariances vs conditional covariances. That's much easier to think about and calculate. If not, and you can produce a scatter plot showing that say a covariance is zero but the two variables are obviously not independent just by looking at the scatter plot, then indeed information theory can definitely help you.

Kalan Kucera (Aug 11 2024 at 23:27):

I don't think that the result necessary needs to be symmetric, each distribution corresponding to a specific creep model may, or may not, contribute equally to the overall phenomena.

Covariance could capture it, perhaps, I will mess with my data and see what pops out. I like working with information because the interpretation of what information represents physically is kind of what I'm trying to get at.

e.g., mechanisms in my field are particular pathways that some State Change of a physical system may undergo... I'm trying to show that, given categories of models, there can be some equivalence of pathways through state changes on the basis of some measure of information. I like KL because the image of a shift away from a model (which may be accurate to reality and precise to reality to varying degrees) works into that concept for me.

I don't know if any of that is clarifying lol. Thank you for the tips though, I really appreciate it!!

Kalan Kucera (Aug 12 2024 at 00:05):

A follow up: If I was going to look at the JSD, and had say 3 exp. models (P, Q, S) and 1 reference model (R), would I be calculating different "information radii" relative to R if I calculated JSD(P||Q), JSD(Q||S) and JSD (P||S)?

Eric M Downes (Aug 12 2024 at 00:09):

JSD works just like a metric so all your intuition applies, in particular
$d(x,z)\leq d(x,y)+d(y,z)$
for any metric, so you can measure something useful by looking at the rhs minus the lhs. You could try that for different models $y$ .

Eric M Downes (Aug 12 2024 at 01:57):

But, if you're doing physics... do you know what the (physical) entropy of the states at the beginning and end of the paths are? Or a state equation? There are a lot of calculations from thermodynamics that come to mind if you do.