Developers

Data type

limix._dataset._normalise_dataset(y, M=None, G=None, K=None)[source]

Convert data types to DataArray.

This is a fundamental function for limix as it standardise outcome, covariates, candidates, and kinship arrays into xarray.DataArray data type. Data arrays are numpy/dask arrays with indexed coordinates, therefore generalising data frames from pandas. It allows for lazy loading of data via dask arrays. It also supports arrays with different dimensionality and types, mixture of indexed and non-indexed arrays, and repeated sample labels.

Examples

>>> from __future__ import unicode_literals
>>> import pytest
>>> from numpy.random import RandomState
>>> from pandas import DataFrame
>>> from xarray import DataArray
>>> from limix._dataset import _normalise_dataset
>>>
>>> random = RandomState(0)
>>>
>>> y = random.randn(4)
>>> y = DataFrame(y, index=["sample0", "sample0", "sample1", "sample2"])
>>>
>>> G = random.randn(5, 6)
>>>
>>> data = _normalise_dataset(y, G=G)
>>> print(data["y"])
<xarray.DataArray 'outcome' (sample: 4, trait: 1)>
array([[1.764052],
       [0.400157],
       [0.978738],
       [2.240893]])
Coordinates:
  * sample   (sample) object 'sample0' 'sample0' 'sample1' 'sample2'
  * trait    (trait) int64 0
>>> print(data["G"])
<xarray.DataArray 'candidates' (sample: 4, candidate: 6)>
array([[ 1.867558, -0.977278,  0.950088, -0.151357, -0.103219,  0.410599],
       [ 0.144044,  1.454274,  0.761038,  0.121675,  0.443863,  0.333674],
       [ 1.494079, -0.205158,  0.313068, -0.854096, -2.55299 ,  0.653619],
       [ 0.864436, -0.742165,  2.269755, -1.454366,  0.045759, -0.187184]])
Coordinates:
  * sample   (sample) object 'sample0' 'sample0' 'sample1' 'sample2'
Dimensions without coordinates: candidate
>>> K = random.randn(3, 3)
>>> K = K.dot(K.T)
>>> K = DataArray(K)
>>> K.coords["dim_0"] = ["sample0", "sample1", "sample2"]
>>> K.coords["dim_1"] = ["sample0", "sample1", "sample2"]
>>>
>>> data = _normalise_dataset(y, K=K)
>>> print(data["y"])
<xarray.DataArray 'outcome' (sample: 4, trait: 1)>
array([[1.764052],
       [0.400157],
       [0.978738],
       [2.240893]])
Coordinates:
  * sample   (sample) object 'sample0' 'sample0' 'sample1' 'sample2'
  * trait    (trait) int64 0
>>> print(data["K"])
<xarray.DataArray 'variance-covariance' (sample_0: 4, sample_1: 4)>
array([[ 1.659103,  1.659103, -0.850801, -1.956422],
       [ 1.659103,  1.659103, -0.850801, -1.956422],
       [-0.850801, -0.850801,  1.687126, -0.194938],
       [-1.956422, -1.956422, -0.194938,  6.027272]])
Coordinates:
  * sample_0  (sample_0) <U7 'sample0' 'sample0' 'sample1' 'sample2'
  * sample_1  (sample_1) <U7 'sample0' 'sample0' 'sample1' 'sample2'
>>> with pytest.raises(ValueError):
...     _normalise_dataset(y, G=G, K=K)