I have an existing DataFrame with metadata. I am now trying to add a column with the data. For each row in the DataFrame I want to add a subset of my xarray.DataSet. However, pandas seems to try and convert the xr.dataset into a numpy array, which obviously fails. Is there any way to do this?
Here is some example code:
import pandas as pd
import xarray as xr
import numpy as np
# Create a DataFrame
df = pd.DataFrame({"id": [1, 2, 3]})
# Initialize an empty column with dtype=object (CRUCIAL!)
df["xr_dataset"] = None # Automatically becomes object dtype
# Or explicitly:
df["xr_dataset"] = pd.Series(dtype=object)
for idx in df.index:
# Create a unique xarray Dataset for each row
ds = xr.Dataset({"temperature": xr.DataArray(np.random.rand(2)),
"pressure": xr.DataArray(np.random.rand(2))}
)
# Use .loc to prevent pandas from auto-converting
df.loc[idx, "xr_dataset"] = ds
I have also tried to store my subsets in a list and to assign that to the dataframe but that fails as well.
This is not super important for me to solve, as I can use other ways to handle my data. But at this point I'm just curious if this is possible at all.
Thanks for your time!