2

Here is a toy example but I have 2 dataframes; (1) rows=samples, cols=attributes; and (2) rows=samples, cols=metadata-fields.

I want to concat or merge to create 3-dimensional xr.DataArray. I've done this multiple times but I can't figure out why it's not working in this case? I want to concat along the patient_id axis to have a 3D xr.DataArray.

Why isn't xr.concat building the 3-dimensional DataArray? I think I'm incorrectly using the dim argument since it is supposed to concat along a new-axis but is there a way to do this along an existing axis?

I'm trying to use the method from Create DataArray from Dict of 2D DataFrames/Arrays but it isn't working. I got merge to work but it puts it into a DataSet w/ 2 data variables

np.random.seed(0)
patient_ids = ["patient_%d"%_ for _ in range(42)]
attr_ids = ["attr_%d"%_ for _ in range(481)]
meta_ids = ["meta_%d"%_ for _ in range(32)]

DA_A = xr.DataArray(pd.DataFrame(np.random.random((42,481)), 
                                 index=patient_ids, 
                                 columns=attr_ids), 
                    dims=["patient_id","attribute"])
DA_B = xr.DataArray(pd.DataFrame(np.random.random((42,32)), 
                                 index=patient_ids,
                                 columns=meta_ids), 
                    dims=["patient_id","metadata"])
DA_A.coords
# Coordinates:
#   * patient_id  (patient_id) object 'patient_0' 'patient_1' 'patient_2' ...
#   * attribute   (attribute) object 'attr_0' 'attr_1' 'attr_2' 'attr_3' ...
DA_B.coords
# Coordinates:
#   * patient_id  (patient_id) object 'patient_0' 'patient_1' 'patient_2' ...
#   * metadata    (metadata) object 'meta_0' 'meta_1' 'meta_2' 'meta_3' ...
xr.concat([DA_A, DA_B], dim="patient_id")
# KeyError: 'attribute'
1
  • 1
    I don't think you want to use concat. The error you are getting is telling you that DA_B doesn't have the attribute dimension. Based on what I see you trying to do, a Dataset with shared/aligned dimensions (patient_id, attribute, metadata) is probably what you want. Commented Aug 14, 2016 at 18:55

1 Answer 1

1

You can't (yet) concatenate DataArrays with different dimensions. You need to broadcast them explicitly first, e.g.,

In [38]: xr.concat(xr.broadcast(DA_A, DA_B), dim="patient_id")
Out[38]:
<xarray.DataArray (patient_id: 84, attribute: 481, metadata: 32)>
array([[[ 0.5488135 ,  0.5488135 ,  0.5488135 , ...,  0.5488135 ,
          0.5488135 ,  0.5488135 ],
        ...,
        [ 0.79649197,  0.97094708,  0.95542135, ...,  0.37856775,
          0.65855316,  0.37893685]]])
Coordinates:
  * attribute   (attribute) object 'attr_0' 'attr_1' 'attr_2' 'attr_3' ...
  * metadata    (metadata) object 'meta_0' 'meta_1' 'meta_2' 'meta_3' ...
  * patient_id  (patient_id) object 'patient_0' 'patient_1' 'patient_2' ...

But as jhamman mentions in the comment on your question, you might actually find it easier to work with a single Dataset object instead, with two different variables, e.g.,

In [39]: xr.Dataset({'A': DA_A, 'B': DA_B})
Out[39]:
<xarray.Dataset>
Dimensions:     (attribute: 481, metadata: 32, patient_id: 42)
Coordinates:
  * patient_id  (patient_id) object 'patient_0' 'patient_1' 'patient_2' ...
  * attribute   (attribute) object 'attr_0' 'attr_1' 'attr_2' 'attr_3' ...
  * metadata    (metadata) object 'meta_0' 'meta_1' 'meta_2' 'meta_3' ...
Data variables:
    A           (patient_id, attribute) float64 0.5488 0.7152 0.6028 0.5449 ...
    B           (patient_id, metadata) float64 0.2438 0.8216 0.9237 0.3999 ...
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks again @Stephan, you've been really helpful with xarray. I've found ways to implement it into my everyday pipeline. I've been trying to spread the word to the bioinformaticians I collaborate with.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.