Quantitative trait locus¶
Introduction¶
A linear mixed model (LMM) can be described as
where \(\mathbf u \sim \mathcal N(\mathbf 0, \sigma_u^2\mathrm I)\) is a vector of random effects and \(\epsilon_i\) are iid Normal random variables with zero-mean and variance \(\sigma_e^2\) each. Covariates are defined by the columns of \(\mathrm M\), and \(\mathrm X\) commonly contain all genetic variants of each sample.
The outcome-vector is thus distributed according to
The parameters \(\boldsymbol\beta_0\), \(\sigma_u\), and \(\sigma_{\epsilon}\) are estimated via the maximum likelihood estimation (MLE) approach under the null hypothesis just defined.
The alternative hypothesis for single-variant testing consists in the addition of a fixed-effect size \(\beta_1\):
The parameter \(\beta_1\) multiplies a given vector \(\mathbf g\), typically representing a genetic locus of interest. The parameters \(\boldsymbol\beta_0\), \(\beta_1\), \(\sigma_u\), and \(\sigma_{\epsilon}\) are estimated via MLE under the alternative hypothesis. The comparison of the two marginal likelihoods learnt under the null and alternative hypotheses allows us to perform a likelihood ratio test [LRT].
We now show how to use limix to perform association tests using
linear mixed models.
The outcome-vector is given by y
.
The covariance matrix is defined by the kinship
variable.
We do not provide any covariate.
In that case, the function limix.qtl.scan()
we call will internally add
a covariate of ones to be multiplied by the offset parameter.
Finally, we pass a matrix candidates
of four columns representing four
alternative hypotheses to be tested:
>>> from numpy.random import RandomState
>>> from numpy import dot
>>> from limix.qtl import scan
>>>
>>> random = RandomState(1)
>>>
>>> n = 25
>>>
>>> candidates = (random.rand(n, 4) < 0.2).astype(float)
>>> y = random.randn(n)
>>> X = random.randn(n, 50)
>>> kinship = dot(X, X.T) / float(25)
>>>
>>> model = scan(candidates, y, 'normal', kinship, verbose=False)
>>> print(model.variant_pvalues.to_dataframe())
pv
candidate
0 0.74981
1 0.00537
2 0.07036
3 0.97155
>>> print(model.variant_effsizes.to_dataframe())
effsizes
candidate
0 -0.09660
1 -1.02874
2 -0.46314
3 -0.01155
>>> print(model.variant_effsizes_se.to_dataframe())
effsizes std
candidate
0 0.30293
1 0.36956
2 0.25594
3 0.32377
>>> print(model)
Variants
--------
effsizes effsizes_se pvalues
count 4.00000 4.00000 4.00000
mean -0.40001 0.31305 0.44927
std 0.46269 0.04716 0.48433
min -1.02874 0.25594 0.00537
25% -0.60454 0.29118 0.05411
50% -0.27987 0.31335 0.41008
75% -0.07534 0.33522 0.80524
max -0.01155 0.36956 0.97155
Covariate effect sizes for H0
-----------------------------
0
0.04828
The above example prints the estimated p-value, effect size, and standard
error of the effect size of each variant.
It also shows a summary of the result by printing the variable model
, an
instance of the limix.qtl.QTLModel
class.
A generalised linear mixed model (GLMM) [McC89] [McC11] in an extension of a LMM that allows for residual errors distributed according to an exponential-family distribution [ExFam]. Let us replace \(\mathbf y\) in the LMM equation by \(\mathbf z\), and define the outcome-vector as
The multivariate Normal distribution \(\mathbf z\) is considered a latent (unobserved) variable. The \(\mu_i\) variable is the parameter defining the expected value of a distribution \(\text{ExpFam}(\cdot)\). It is defined via a link function \(g(\cdot)\), which converts the interval of \(z_i\) (real numbers) to the appropriate interval for \(\mu_i\).
The following example applies limix.qtl.scan()
to perform five
likelihood ratio tests for association with an outcome vector y
having
residual errors that follow a Poisson distribution.
The matrix G
defines both the five alternative hypotheses
(the first five columns) and the covariance matrix (the remaining columns).
>>> from numpy import dot, exp, sqrt
>>> from numpy.random import RandomState
>>> from limix.qtl import scan
>>>
>>> random = RandomState(0)
>>>
>>> G = random.randn(25, 50) / sqrt(50)
>>> beta = 0.01 * random.randn(50)
>>>
>>> z = dot(G, beta) + 0.1 * random.randn(25)
>>> z += dot(G[:, 0], 1) # causal SNP
>>>
>>> y = random.poisson(exp(z))
>>>
>>> candidates = G[:, :5]
>>> K = dot(G[:, 5:], G[:, 5:].T)
>>> model = scan(candidates, y, 'poisson', K, verbose=False)
>>>
>>> print(model.variant_pvalues.to_dataframe())
pv
candidate
0 0.19819
1 0.44134
2 0.47341
3 0.21548
4 0.70666
>>> print(model.variant_effsizes.to_dataframe())
effsizes
candidate
0 1.69168
1 -1.00863
2 -1.24902
3 2.04198
4 -0.50974
>>> print(model.variant_effsizes_se.to_dataframe())
effsizes std
candidate
0 1.31470
1 1.31003
2 1.74216
3 1.64859
4 1.3544
>>> print(model)
Variants
--------
effsizes effsizes_se pvalues
count 5.00000 5.00000 5.00000
mean 0.19325 1.47399 0.40702
std 1.55579 0.20552 0.20956
min -1.24902 1.31003 0.19819
25% -1.00863 1.31470 0.21548
50% -0.50974 1.35444 0.44134
75% 1.69168 1.64859 0.47341
max 2.04198 1.74216 0.70666
Covariate effect sizes for H0
-----------------------------
0
-0.00042
Interface¶
-
limix.qtl.
scan
(G, y, lik, K=None, M=None, verbose=True)[source] Single-variant association testing via generalised linear mixed models.
It supports Normal (linear mixed model), Bernoulli, Probit, Binomial, and Poisson residual errors, defined by
lik
. The columns ofG
define the candidates to be tested for association with the phenotypey
. The covariance matrix is set byK
. If not provided, or set toNone
, the generalised linear model without random effects is assumed. The covariates can be set via the parameterM
. We recommend to always provide a column of ones when covariates are actually provided.Parameters: - G (array_like) – \(N\) individuals by \(S\) candidate markers.
- y (tuple, array_like) – Either a tuple of two arrays of \(N\) individuals each (Binomial phenotypes) or an array of \(N\) individuals (Normal, Poisson, Bernoulli, or Probit phenotypes).
- lik ("normal", "bernoulli", "probit", binomial", "poisson") – Sample likelihood describing the residual distribution.
- K (array_like, optional) – \(N\)-by-\(N\) covariance matrix (e.g., kinship coefficients).
Set to
None
for a generalised linear model without random effects. Defaults toNone
. - M (array_like, optional) – N individuals by S covariates.
It will create a \(N\)-by-\(1\) matrix
M
of ones representing the offset covariate ifNone
is passed. If an array is passed, it will used as is. Defaults toNone
. - verbose (bool, optional) –
True
to display progress and summary;False
otherwise.
Returns: QTL representation.
Return type: Examples
>>> from numpy import dot, exp, sqrt, ones >>> from numpy.random import RandomState >>> from pandas import DataFrame >>> import pandas as pd >>> from limix.qtl import scan >>> >>> random = RandomState(1) >>> pd.options.display.float_format = "{:9.6f}".format >>> >>> n = 30 >>> p = 3 >>> samples_index = range(n) >>> >>> M = DataFrame(dict(offset=ones(n), age=random.randint(10, 60, n))) >>> M.index = samples_index >>> >>> X = random.randn(n, 100) >>> K = dot(X, X.T) >>> >>> candidates = random.randn(n, p) >>> candidates = DataFrame(candidates, index=samples_index, ... columns=['rs0', 'rs1', 'rs2']) >>> >>> y = random.poisson(exp(random.randn(n))) >>> >>> model = scan(candidates, y, 'poisson', K, M=M, verbose=False) >>> >>> model.variant_pvalues.to_dataframe() pv candidate rs0 0.554444 rs1 0.218996 rs2 0.552200 >>> model.variant_effsizes.to_dataframe() effsizes candidate rs0 -0.130867 rs1 -0.315078 rs2 -0.143869 >>> model.variant_effsizes_se.to_dataframe() effsizes std candidate rs0 0.221390 rs1 0.256327 rs2 0.242013 >>> model Variants -------- effsizes effsizes_se pvalues count 3.000000 3.000000 3.000000 mean -0.196604 0.239910 0.441880 std 0.102807 0.017563 0.193027 min -0.315077 0.221389 0.218996 25% -0.229473 0.231701 0.385598 50% -0.143869 0.242013 0.552200 75% -0.137367 0.249170 0.553322 max -0.130866 0.256326 0.554443 Covariate effect sizes for H0 ----------------------------- age offset -0.005568 0.395287
Notes
It will raise a
ValueError
exception if non-finite values are passed. Please, refer to thelimix.qc.mean_impute()
function for missing value imputation.
-
class
limix.qtl.
QTLModel
(null_lml, alt_lmls, effsizes, null_covariate_effsizes)[source] Result of a QTL analysis.
An instance of this class is returned by
limix.qtl.scan()
.-
alt_lmls
Log of the marginal likelihoods across tested variants.
Returns: Log of marginal likelihoods. Return type: array_like
-
null_covariate_effsizes
Covariate effect-sizes under the null hypothesis.
Returns: Estimated covariant effect sizes under the null hypothesis. Return type: array_like
-
null_lml
Log of the marginal likelihood under the null hypothesis.
Returns: Log of marginal likelihood. Return type: float
-
variant_effsizes
Variant effect-sizes.
Returns: Estimated variant effect sizes. Return type: array_like
-
variant_effsizes_se
Standard errors of the variant effect sizes.
Returns: Estimated standard errors of the variant effect sizes. Return type: array_like
-
variant_pvalues
Variant p-values.
Returns: Association significance between variant and phenotype. Return type: array_like
-
References
[McC89] | McCullagh, Peter, and John A. Nelder. Generalized linear models. Vol. 37. CRC press, 1989. |
[McC11] | McCulloch, Charles E., and Shayle R. Searle. Generalized, linear, and mixed models. John Wiley & Sons, 2004. |
[ExFam] | Wikipedia contributors. (2018, June 29). Exponential family. In Wikipedia, The Free Encyclopedia. Retrieved 13:47, July 26, 2018, from https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=848114709 |
[LRT] | Wikipedia contributors. (2018, June 6). Likelihood-ratio test. In Wikipedia, The Free Encyclopedia. Retrieved 13:50, July 26, 2018, from https://en.wikipedia.org/w/index.php?title=Likelihood-ratio_test&oldid=844734768 |