limix.qtl.scan¶
-
limix.qtl.
scan
(G, y, lik, K=None, M=None, verbose=True)[source]¶ Single-variant association testing via generalised linear mixed models.
It supports Normal (linear mixed model), Bernoulli, Probit, Binomial, and Poisson residual errors, defined by
lik
. The columns ofG
define the candidates to be tested for association with the phenotypey
. The covariance matrix is set byK
. If not provided, or set toNone
, the generalised linear model without random effects is assumed. The covariates can be set via the parameterM
. We recommend to always provide a column of ones when covariates are actually provided.Parameters: - G (array_like) – \(N\) individuals by \(S\) candidate markers.
- y (tuple, array_like) – Either a tuple of two arrays of \(N\) individuals each (Binomial phenotypes) or an array of \(N\) individuals (Normal, Poisson, Bernoulli, or Probit phenotypes).
- lik ("normal", "bernoulli", "probit", binomial", "poisson") – Sample likelihood describing the residual distribution.
- K (array_like, optional) – \(N\)-by-\(N\) covariance matrix (e.g., kinship coefficients).
Set to
None
for a generalised linear model without random effects. Defaults toNone
. - M (array_like, optional) – N individuals by S covariates.
It will create a \(N\)-by-\(1\) matrix
M
of ones representing the offset covariate ifNone
is passed. If an array is passed, it will used as is. Defaults toNone
. - verbose (bool, optional) –
True
to display progress and summary;False
otherwise.
Returns: QTL representation.
Return type: Examples
>>> from numpy import dot, exp, sqrt, ones >>> from numpy.random import RandomState >>> from pandas import DataFrame >>> import pandas as pd >>> from limix.qtl import scan >>> >>> random = RandomState(1) >>> pd.options.display.float_format = "{:9.6f}".format >>> >>> n = 30 >>> p = 3 >>> samples_index = range(n) >>> >>> M = DataFrame(dict(offset=ones(n), age=random.randint(10, 60, n))) >>> M.index = samples_index >>> >>> X = random.randn(n, 100) >>> K = dot(X, X.T) >>> >>> candidates = random.randn(n, p) >>> candidates = DataFrame(candidates, index=samples_index, ... columns=['rs0', 'rs1', 'rs2']) >>> >>> y = random.poisson(exp(random.randn(n))) >>> >>> model = scan(candidates, y, 'poisson', K, M=M, verbose=False) >>> >>> model.variant_pvalues.to_dataframe() pv candidate rs0 0.554444 rs1 0.218996 rs2 0.552200 >>> model.variant_effsizes.to_dataframe() effsizes candidate rs0 -0.130867 rs1 -0.315078 rs2 -0.143869 >>> model.variant_effsizes_se.to_dataframe() effsizes std candidate rs0 0.221390 rs1 0.256327 rs2 0.242013 >>> model Variants -------- effsizes effsizes_se pvalues count 3.000000 3.000000 3.000000 mean -0.196604 0.239910 0.441880 std 0.102807 0.017563 0.193027 min -0.315077 0.221389 0.218996 25% -0.229473 0.231701 0.385598 50% -0.143869 0.242013 0.552200 75% -0.137367 0.249170 0.553322 max -0.130866 0.256326 0.554443 Covariate effect sizes for H0 ----------------------------- age offset -0.005568 0.395287
Notes
It will raise a
ValueError
exception if non-finite values are passed. Please, refer to thelimix.qc.mean_impute()
function for missing value imputation.