Quantitative trait locus¶

Introduction¶

A linear mixed model (LMM) can be described as

\[\mathbf y = \mathrm M\boldsymbol\beta_0 + \mathrm X\mathbf u + \boldsymbol\epsilon,\]

where \(\mathbf u \sim \mathcal N(\mathbf 0, \sigma_u^2\mathrm I)\) is a vector of random effects and \(\epsilon_i\) are iid Normal random variables with zero-mean and variance \(\sigma_e^2\) each. Covariates are defined by the columns of \(\mathrm M\), and \(\mathrm X\) commonly contain all genetic variants of each sample.

The outcome-vector is thus distributed according to

\[\mathbf y \sim \mathcal N(\mathrm M\boldsymbol\beta_0, \sigma_u^2 \mathrm X \mathrm X^{\intercal} + \sigma_e^2\mathrm I).\]

The parameters \(\boldsymbol\beta_0\), \(\sigma_u\), and \(\sigma_{\epsilon}\) are estimated via the maximum likelihood estimation (MLE) approach under the null hypothesis just defined.

The alternative hypothesis for single-variant testing consists in the addition of a fixed-effect size \(\beta_1\):

\[\mathbf y = \mathrm M\boldsymbol\beta_1 + \mathbf g\beta_1 + \mathrm X\mathbf u + \boldsymbol\epsilon.\]

The parameter \(\beta_1\) multiplies a given vector \(\mathbf g\), typically representing a genetic locus of interest. The parameters \(\boldsymbol\beta_0\), \(\beta_1\), \(\sigma_u\), and \(\sigma_{\epsilon}\) are estimated via MLE under the alternative hypothesis. The comparison of the two marginal likelihoods learnt under the null and alternative hypotheses allows us to perform a likelihood ratio test [LRT].

We now show how to use limix to perform association tests using linear mixed models. The outcome-vector is given by y. The covariance matrix is defined by the kinship variable. We do not provide any covariate. In that case, the function limix.qtl.scan() we call will internally add a covariate of ones to be multiplied by the offset parameter. Finally, we pass a matrix candidates of four columns representing four alternative hypotheses to be tested:

>>> from numpy.random import RandomState
>>> from numpy import dot
>>> from limix.qtl import scan
>>>
>>> random = RandomState(1)
>>>
>>> n = 25
>>>
>>> candidates = (random.rand(n, 4) < 0.2).astype(float)
>>> y = random.randn(n)
>>> X = random.randn(n, 50)
>>> kinship = dot(X, X.T) / float(25)
>>>
>>> model = scan(candidates, y, 'normal', kinship, verbose=False)
>>> print(model.variant_pvalues.to_dataframe()) 
                 pv
candidate
0           0.74981
1           0.00537
2           0.07036
3           0.97155
>>> print(model.variant_effsizes.to_dataframe()) 
            effsizes
candidate
0          -0.09660
1          -1.02874
2          -0.46314
3          -0.01155
>>> print(model.variant_effsizes_se.to_dataframe()) 
           effsizes std
candidate
0               0.30293
1               0.36956
2               0.25594
3               0.32377
>>> print(model) 
Variants
--------
       effsizes  effsizes_se  pvalues
count   4.00000      4.00000  4.00000
mean   -0.40001      0.31305  0.44927
std     0.46269      0.04716  0.48433
min    -1.02874      0.25594  0.00537
25%    -0.60454      0.29118  0.05411
50%    -0.27987      0.31335  0.41008
75%    -0.07534      0.33522  0.80524
max    -0.01155      0.36956  0.97155

Covariate effect sizes for H0
-----------------------------
 0
0.04828

The above example prints the estimated p-value, effect size, and standard error of the effect size of each variant. It also shows a summary of the result by printing the variable model, an instance of the limix.qtl.QTLModel class.

A generalised linear mixed model (GLMM) [McC89] [McC11] in an extension of a LMM that allows for residual errors distributed according to an exponential-family distribution [ExFam]. Let us replace \(\mathbf y\) in the LMM equation by \(\mathbf z\), and define the outcome-vector as

\[y_i ~|~ z_i \sim \text{ExpFam}(\mu_i = g(z_i)).\]

The multivariate Normal distribution \(\mathbf z\) is considered a latent (unobserved) variable. The \(\mu_i\) variable is the parameter defining the expected value of a distribution \(\text{ExpFam}(\cdot)\). It is defined via a link function \(g(\cdot)\), which converts the interval of \(z_i\) (real numbers) to the appropriate interval for \(\mu_i\).

The following example applies limix.qtl.scan() to perform five likelihood ratio tests for association with an outcome vector y having residual errors that follow a Poisson distribution. The matrix G defines both the five alternative hypotheses (the first five columns) and the covariance matrix (the remaining columns).

>>> from numpy import dot, exp, sqrt
>>> from numpy.random import RandomState
>>> from limix.qtl import scan
>>>
>>> random = RandomState(0)
>>>
>>> G = random.randn(25, 50) / sqrt(50)
>>> beta = 0.01 * random.randn(50)
>>>
>>> z = dot(G, beta) + 0.1 * random.randn(25)
>>> z += dot(G[:, 0], 1) # causal SNP
>>>
>>> y = random.poisson(exp(z))
>>>
>>> candidates = G[:, :5]
>>> K = dot(G[:, 5:], G[:, 5:].T)
>>> model = scan(candidates, y, 'poisson', K, verbose=False)
>>>
>>> print(model.variant_pvalues.to_dataframe()) 
                pv
candidate
0          0.19819
1          0.44134
2          0.47341
3          0.21548
4          0.70666
>>> print(model.variant_effsizes.to_dataframe()) 
           effsizes
candidate
0           1.69168
1          -1.00863
2          -1.24902
3           2.04198
4          -0.50974
>>> print(model.variant_effsizes_se.to_dataframe()) 
           effsizes std
candidate
0               1.31470
1               1.31003
2               1.74216
3               1.64859
4               1.3544
>>> print(model) 
Variants
--------
       effsizes  effsizes_se  pvalues
count   5.00000      5.00000  5.00000
mean    0.19325      1.47399  0.40702
std     1.55579      0.20552  0.20956
min    -1.24902      1.31003  0.19819
25%    -1.00863      1.31470  0.21548
50%    -0.50974      1.35444  0.44134
75%     1.69168      1.64859  0.47341
max     2.04198      1.74216  0.70666

Covariate effect sizes for H0
-----------------------------
 0
   -0.00042

Interface¶

limix.qtl.scan(G, y, lik, K=None, M=None, verbose=True)[source]

Single-variant association testing via generalised linear mixed models.

It supports Normal (linear mixed model), Bernoulli, Probit, Binomial, and Poisson residual errors, defined by lik. The columns of G define the candidates to be tested for association with the phenotype y. The covariance matrix is set by K. If not provided, or set to None, the generalised linear model without random effects is assumed. The covariates can be set via the parameter M. We recommend to always provide a column of ones when covariates are actually provided.

Parameters:	G (array_like) – \(N\) individuals by \(S\) candidate markers. y (tuple, array_like) – Either a tuple of two arrays of \(N\) individuals each (Binomial phenotypes) or an array of \(N\) individuals (Normal, Poisson, Bernoulli, or Probit phenotypes). lik ("normal", "bernoulli", "probit", binomial", "poisson") – Sample likelihood describing the residual distribution. K (array_like, optional) – \(N\)-by-\(N\) covariance matrix (e.g., kinship coefficients). Set to `None` for a generalised linear model without random effects. Defaults to `None`. M (array_like, optional) – N individuals by S covariates. It will create a \(N\)-by-\(1\) matrix `M` of ones representing the offset covariate if `None` is passed. If an array is passed, it will used as is. Defaults to `None`. verbose (bool, optional) – `True` to display progress and summary; `False` otherwise.
Returns:	QTL representation.
Return type:	`limix.qtl.QTLModel`

Examples

>>> from numpy import dot, exp, sqrt, ones
>>> from numpy.random import RandomState
>>> from pandas import DataFrame
>>> import pandas as pd
>>> from limix.qtl import scan
>>>
>>> random = RandomState(1)
>>> pd.options.display.float_format = "{:9.6f}".format
>>>
>>> n = 30
>>> p = 3
>>> samples_index = range(n)
>>>
>>> M = DataFrame(dict(offset=ones(n), age=random.randint(10, 60, n)))
>>> M.index = samples_index
>>>
>>> X = random.randn(n, 100)
>>> K = dot(X, X.T)
>>>
>>> candidates = random.randn(n, p)
>>> candidates = DataFrame(candidates, index=samples_index,
...                                    columns=['rs0', 'rs1', 'rs2'])
>>>
>>> y = random.poisson(exp(random.randn(n)))
>>>
>>> model = scan(candidates, y, 'poisson', K, M=M, verbose=False)
>>>
>>> model.variant_pvalues.to_dataframe()  
                 pv
candidate
rs0        0.554444
rs1        0.218996
rs2        0.552200
>>> model.variant_effsizes.to_dataframe()  
           effsizes
candidate
rs0       -0.130867
rs1       -0.315078
rs2       -0.143869
>>> model.variant_effsizes_se.to_dataframe()  
           effsizes std
candidate
rs0            0.221390
rs1            0.256327
rs2            0.242013
>>> model  
Variants
--------
       effsizes  effsizes_se   pvalues
count  3.000000     3.000000  3.000000
mean  -0.196604     0.239910  0.441880
std    0.102807     0.017563  0.193027
min   -0.315077     0.221389  0.218996
25%   -0.229473     0.231701  0.385598
50%   -0.143869     0.242013  0.552200
75%   -0.137367     0.249170  0.553322
max   -0.130866     0.256326  0.554443

Covariate effect sizes for H0
-----------------------------
      age    offset
-0.005568  0.395287

Notes

It will raise a ValueError exception if non-finite values are passed. Please, refer to the limix.qc.mean_impute() function for missing value imputation.

class limix.qtl.QTLModel(null_lml, alt_lmls, effsizes, null_covariate_effsizes)[source]

Result of a QTL analysis.

An instance of this class is returned by limix.qtl.scan().

alt_lmls

Log of the marginal likelihoods across tested variants.

Returns:	Log of marginal likelihoods.
Return type:	array_like

null_covariate_effsizes

Covariate effect-sizes under the null hypothesis.

Returns:	Estimated covariant effect sizes under the null hypothesis.
Return type:	array_like

null_lml

Log of the marginal likelihood under the null hypothesis.

Returns:	Log of marginal likelihood.
Return type:	float

variant_effsizes

Variant effect-sizes.

Returns:	Estimated variant effect sizes.
Return type:	array_like

variant_effsizes_se

Standard errors of the variant effect sizes.

Returns:	Estimated standard errors of the variant effect sizes.
Return type:	array_like

variant_pvalues

Variant p-values.

Returns:	Association significance between variant and phenotype.
Return type:	array_like

References

[McC89]

McCullagh, Peter, and John A. Nelder. Generalized linear models. Vol. 37. CRC press, 1989.

[McC11]

McCulloch, Charles E., and Shayle R. Searle. Generalized, linear, and mixed models. John Wiley & Sons, 2004.

[ExFam]

Wikipedia contributors. (2018, June 29). Exponential family. In Wikipedia, The Free Encyclopedia. Retrieved 13:47, July 26, 2018, from https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=848114709

[LRT]

Wikipedia contributors. (2018, June 6). Likelihood-ratio test. In Wikipedia, The Free Encyclopedia. Retrieved 13:50, July 26, 2018, from https://en.wikipedia.org/w/index.php?title=Likelihood-ratio_test&oldid=844734768