limix.qtl.scan

limix.qtl.scan(G, y, lik, K=None, M=None, verbose=True)[source]

Single-variant association testing via generalised linear mixed models.

It supports Normal (linear mixed model), Bernoulli, Probit, Binomial, and Poisson residual errors, defined by lik. The columns of G define the candidates to be tested for association with the phenotype y. The covariance matrix is set by K. If not provided, or set to None, the generalised linear model without random effects is assumed. The covariates can be set via the parameter M. We recommend to always provide a column of ones when covariates are actually provided.

Parameters:
  • G (array_like) – \(N\) individuals by \(S\) candidate markers.
  • y (tuple, array_like) – Either a tuple of two arrays of \(N\) individuals each (Binomial phenotypes) or an array of \(N\) individuals (Normal, Poisson, Bernoulli, or Probit phenotypes).
  • lik ("normal", "bernoulli", "probit", binomial", "poisson") – Sample likelihood describing the residual distribution.
  • K (array_like, optional) – \(N\)-by-\(N\) covariance matrix (e.g., kinship coefficients). Set to None for a generalised linear model without random effects. Defaults to None.
  • M (array_like, optional) – N individuals by S covariates. It will create a \(N\)-by-\(1\) matrix M of ones representing the offset covariate if None is passed. If an array is passed, it will used as is. Defaults to None.
  • verbose (bool, optional) – True to display progress and summary; False otherwise.
Returns:

QTL representation.

Return type:

limix.qtl.QTLModel

Examples

>>> from numpy import dot, exp, sqrt, ones
>>> from numpy.random import RandomState
>>> from pandas import DataFrame
>>> import pandas as pd
>>> from limix.qtl import scan
>>>
>>> random = RandomState(1)
>>> pd.options.display.float_format = "{:9.6f}".format
>>>
>>> n = 30
>>> p = 3
>>> samples_index = range(n)
>>>
>>> M = DataFrame(dict(offset=ones(n), age=random.randint(10, 60, n)))
>>> M.index = samples_index
>>>
>>> X = random.randn(n, 100)
>>> K = dot(X, X.T)
>>>
>>> candidates = random.randn(n, p)
>>> candidates = DataFrame(candidates, index=samples_index,
...                                    columns=['rs0', 'rs1', 'rs2'])
>>>
>>> y = random.poisson(exp(random.randn(n)))
>>>
>>> model = scan(candidates, y, 'poisson', K, M=M, verbose=False)
>>>
>>> model.variant_pvalues.to_dataframe()  
                 pv
candidate
rs0        0.554444
rs1        0.218996
rs2        0.552200
>>> model.variant_effsizes.to_dataframe()  
           effsizes
candidate
rs0       -0.130867
rs1       -0.315078
rs2       -0.143869
>>> model.variant_effsizes_se.to_dataframe()  
           effsizes std
candidate
rs0            0.221390
rs1            0.256327
rs2            0.242013
>>> model  
Variants
--------
       effsizes  effsizes_se   pvalues
count  3.000000     3.000000  3.000000
mean  -0.196604     0.239910  0.441880
std    0.102807     0.017563  0.193027
min   -0.315077     0.221389  0.218996
25%   -0.229473     0.231701  0.385598
50%   -0.143869     0.242013  0.552200
75%   -0.137367     0.249170  0.553322
max   -0.130866     0.256326  0.554443

Covariate effect sizes for H0
-----------------------------
      age    offset
-0.005568  0.395287

Notes

It will raise a ValueError exception if non-finite values are passed. Please, refer to the limix.qc.mean_impute() function for missing value imputation.