I was reading a paper, and encountered a figure that showed the correlation, mutual information, and mean-squared prediction error, for a pair of time-series. This seemed a bit redundant. It turns out it was added to the paper on the request of a reviewer. If your data are jointly Gaussian, these all measure the same thing; no need to clutter a figure by showing all of them.
For a jointly Gaussian pair of random variables, correlation, root mean squared error, correlation, and signal to noise ratio, are all equivalent and can be computed from each-other.
Some identities
Consider two time series x and y that can be well-approximated as jointly Gaussian. To simplify things, let x and y have zero mean and unit variance (the math still works out without this assumption, but its also easy to ensure by z-scoring the data). Also, let n be a zero-mean unit-variance Gaussian random variable that captures noise, i.e. fluctuation in y that cannot be explained by x.
Let's say we're interested in a linear relationship between x and y:
y=ax+bn.
The linear dependence of y on x is summarized by a single parameter
Since the signal and noise are independent, their variances combine linearly:
σ2y=a2σ2x+b2σ2n.
The sum a2+b2 is constrained by the variances in x, y, and n. In this example we've assumed these are all 1, so
a2+b2=1.
Incorporate this constraint by defining α=a2 and writing
σ2y=ασ2x+(1−α)σ2n
and
y=x√α+n√1−α.
(We'll show later that α is the squared Pearson correlation coefficient, i.e. it is the coefficient of determination.)
From this the signal-to-noise ratio and mutual information can be calculated
The Signal-to-Noise Ratio (SNR) is the ratio of the signal and noise contributions to x, and simplifies as
SNR=σ2axσ2bn=ασ2x(1−α)σ2n=α1−α.
On jointly Gaussian channels mutual information I (in bits, is using log2) is a monotonic function of SNR, and simplifies as:
I=12log2(1+SNR)=12log2σ2yσ2bn=12log2σ2y(1−α)σ2n=12log211−α.
Relationship between a, b, alpha, and Pearson correlation ρ
Since x and n are independent, the samples of x and n can be viewed as an orthonormal basis for the samples of y, with weights a and b, respectively. This relates the gain parameters to correlation: the tangent of the angle between y and x is just ratio of the noise gain b to the signal gain a:
tan(θ)=ba=√1−α√α
Then, tan(θ) can be expressed in terms of the correlation coefficient ρ:
tan(θ)=sin(θ)cos(θ)=√1−cos(θ)2cos(θ)=√1−ρ2ρ
This implies that
√1−α√α=√1−ρ2ρ,
which implies that that α=ρ2, i.e. a=ρ.
A few more identities
This can be used to relate correlation ρ to SNR and mutual information:
SNR=ρ21−ρ2
I=12log211−ρ2=−12log2(1−ρ2)
If ϕ=√1−ρ2 is the correlation of y and the noise n (i.e. ϕ is the amplitude of the noise contribution to y), then information is simply I=−log2(ϕ).
Mean squared error (MSE) is also related :
MSE=(1−ρ)2+(1−ρ2)=1−2ρ+1=2(1−ρ),
which implies that
ρ=1−12MSE,
and gives a relationship between mutual information and mean squared error:
I=−12lg(1−ρ2)=−12log2(1−(1−MSE/2)2)