Xarray reads data in file as coordinates, apparently indexing coordinates, how do I convert the actual data from coordinates to data variables?

Question

I am working with the file at:

https://satdat.ngdc.noaa.gov/sem/poes/data/processed/ngdc/uncorrected/full/2013/metop01/poes_m01_20130525_proc.nc

when I read it in using xarray,

ds = xr.open_dataset('poes_m01_20130525_proc.nc')

all of the variables are read in as coordinates, with at least some of them as indexing coordinates. I only know the last bit, because when I try to convert them to variables using,

ds.reset_coords()

I get the error,

ValueError: cannot remove index coordinates with reset_coords.  The error appears to include all of the variables (there is a very long list).

I can convert all of the coordinate variables into a numpy array and rebuild a new Dataset manually. However, I am very new to xarray. Is there a more elegant way to do this? For instance, can I convert the indexing coordinates to non-indexing coordinates and then use reset_coords? Also, how do I tell which coordinates are indexing coordinates and which are not?

Or, better, is there some option that I should be using when reading the file that I don't know to use. I don't recognize anything in the documentation that would suggest this, but there is a lot in the documentation that I don't understand.

Thanks for any help!

OriolAbril · Accepted Answer · 2020-05-27 21:27:13Z

2

As you have guessed, to be able to convert a coordinate to a data variable, it must be a non indexing coordinate. You'll recognize indexing coordinates because they have a * right before them when coordinates are listed. In your example, it looks like every single variable is assumed to be it's own coordinate (no idea why, I'm not a NetCDF expert).

To convert an indexing coordinate into a non indexing coordinate, you can use reset_index, which requires specifying which index are to be reset. I took the liberty of assuming the first handful of coordinates in your dataset are correctly set as coordinates and the rest should be data variables. In this case, the following code could solve the problem:

var_names = list(
    set(ds.dims) - 
    {"time", "year", "day", "msec", "satID", "sat_direction", "alt", "lat", "lon"}
)
clean_ds = ds.reset_index(var_names).reset_coords()

This leaves us with a not very useful dataset though. reset_index has added a _ at the end of each variable name (to distinguish the non indexing coordinate from the dimension with the same name). You'll probably want to do something similar to what is done in this other answer: Xarray: Make two DataArrays in the same Dataset use the same coordinate system

Some ideas:

Get all variables to have time as dimension

coord_names = ["time", "year", "day", "msec", "satID", "sat_direction", "alt", "lat", "lon"]
clean_ds = clean_ds.reset_index(coord_names + [])
clean_ds = clean_ds.rename({name: "time_" for name in clean_ds.dims})

Then, rename variables and coords (and time_ dim) to remove trailing underscore in name:

clean_ds.rename({f"{name}_": name for name in var_names+coord_names})

If we had used rename_vars the dimension time_ would not have been renamed, it could be renamed afterwards to keep time coord and dim different.

After all renaming and restructuring, attributes from the original Dataset can be added again to clean_ds

for var_name in ds.coords:
    clean_ds[var_name] = clean_ds[var_name].assign_attrs(ds[var_name].attrs)

edited May 27, 2020 at 21:27

answered May 25, 2020 at 22:10

OriolAbril

9,0334 gold badges33 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Shawn Over a year ago

This also answers my question as to how to create non-indexing coordinates. Yes, I did want all of those as coordinates, but when I used my other method of simply building a new dataset I had not yet learned how to create a dataset with multiple, equivalent coordinates. I always ended up with multiple dimensions. Thanks!

OriolAbril Over a year ago

If you are creating xarray structures from scratch, you can also specify the dims and coordinates of each object: see creating a DataArray and both creating a Dataset and Dataset API page. From this last link, note how with Datasets for instance, you can pass a dict as data and depending on the format of the dictionary it will be understood as data_var or coordinate

Shawn Over a year ago

So, I may have declared victory too soon. I just found out that when I use the approach above, I have lost all of the variable and coordinate attribute information. Is there a way to preserve that?

OriolAbril Over a year ago

I did not realize the dataarrays had attributes but that was totally unexpected, thanks for pointing out. I have updated the answer to add attributes again into the clean_ds using the original ds, and I'll try to dig deeper, could be a bug :/

Shawn Over a year ago

Thank you and please let me know if it is a bug or not! I believe that pandas dataframes can have attributes, but they are not guaranteed to survive all operations. I thought that was guaranteed with xarray, and that is one of the attractions to it. I hope it was simply a bug that will be fixed an not an accepted issue that won't be fixed!

|

Collectives™ on Stack Overflow

Xarray reads data in file as coordinates, apparently indexing coordinates, how do I convert the actual data from coordinates to data variables?

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related