limix.qc.normalise_covariance¶
-
limix.qc.
normalise_covariance
(K, out=None)[source]¶ Variance rescaling of covariance matrix
K
.Let \(n\) be the number of rows (or columns) of
K
and let \(m_i\) be the average of the values in the i-th column. Gower rescaling is defined as\[\mathrm K \frac{n - 1}{\text{trace}(\mathrm K) - \sum m_i}.\]It works well with Dask array as log as
out
isNone
.Notes
The reasoning of the scaling is as follows. Let \(\mathbf g\) be a vector of \(n\) independent samples and let \(\mathrm C\) be the Gower’s centering matrix. The unbiased variance estimator is
\[v = \sum_i \frac{(g_i-\overline g)^2}{n-1} =\frac{\mathrm{Tr} [(\mathbf g-\overline g\mathbf 1)^t(\mathbf g-\overline g\mathbf 1)]} {n-1} = \frac{\mathrm{Tr}[\mathrm C\mathbf g\mathbf g^t\mathrm C]}{n-1}\]Let \(\mathrm K\) be the covariance matrix of \(\mathbf g\). The expectation of the unbiased variance estimator is
\[\mathbb E[v] = \frac{\mathrm{Tr}[\mathrm C\mathbb E[\mathbf g\mathbf g^t]\mathrm C]} {n-1} = \frac{\mathrm{Tr}[\mathrm C\mathrm K\mathrm C]}{n-1}\]assuming that \(\mathbb E[g_i]=0\). We thus divide \(\mathrm K\) by \(\mathbb E[v]\) to achieve the desired normalisation.
Parameters: - K (array_like) – Covariance matrix to be normalised.
- out (array_like, optional) – Result destination. Defaults to
None
.
Examples
>>> from numpy import dot, mean, zeros >>> from numpy.random import RandomState >>> from limix.qc import normalise_covariance >>> >>> random = RandomState(0) >>> X = random.randn(10, 10) >>> K = dot(X, X.T) >>> Z = random.multivariate_normal(zeros(10), K, 500) >>> print("%.3f" % mean(Z.var(1, ddof=1))) 9.824 >>> Kn = normalise_covariance(K) >>> Zn = random.multivariate_normal(zeros(10), Kn, 500) >>> print("%.3f" % mean(Zn.var(1, ddof=1))) 1.008