I have a NetCDF database (link to file) obtained from Copernicus Climate Data of ~43k unevenly spaced values around the world. Instead of being indexed using (lat, lon) it uses a sequence of 'stations'.
My understanding is that station_x_coordinate and station_y_coordinate are not treated as dimension coordinates by xarrays (according to xarrays docs).
I have a list of my own locations for which I want to get the closest value (i.e., my locations do not necessarily match one of the data points in the NetCDF database).
I would like to use the many selection and interpolation methods of xarray (e.g. xarray.Dataset.sel to get values in locations unmatched to data points). But I get errors or problems which are, I guess, related to the fact that I don't have real dimension coordinates but variables.
For example, I can use xarray.Dataset.sel(stations = 11.5, method='nearest') which will bring the closest value (in this case, station = 12) but this is very unintuitive, as I have no idea how the stations are ordered in the grid. I would rather search using the more sensible indexing (lat, lon) on my target locations. To be clear, I would like to use it like xarray.Dataset.sel(x = 0, y = 0, method='nearest') to find the closest value at (lat = 0, lon = 0) location.
Until now I was able to get all the data points of the NetCDF file into a regular pandas dataframe (columns = ['lat', 'lon', 'value']) and save it as CSV. I can try to make my own function to find the nearest neighbors in 2D space given a target location (it must exist for sure in another library, maybe even numpy or GeoPandas)... but as I said I would like to use the available methods of xarray to get values at 'unmatched' locations, interpolate, etc.
What would you do? For example, can I generate a new NetCDF file with a structure that sets station_x_coordinate and station_y_coordinate of my current NetCDF file as real dimension coordinates in the new file? I am no expert in netCDF files, but I assume this way I could use the methods mentioned above. Makes sense?
This is the structure of the NetCDF file (output of xarray.Dataset):
<xarray.Dataset>
Dimensions: (stations: 43119)
Coordinates:
* stations (stations) uint16 0 1 2 3 ... 43731 43732 43733
station_x_coordinate (stations) float64 ...
station_y_coordinate (stations) float64 ...
Data variables:
return_mean_surge_level (stations) float64 ...
Attributes: (12/34)
Conventions: CF-1.6
featureType: timeSeries
id: GTSMv3_extreme_value_analysis
naming_authority: https://deltares.nl/en
Metadata_Conventions: Unidata Dataset Discovery v1.0
title: relative change in return values for surge...
... ...
geospatial_vertical_max: 18.564
geospatial_vertical_units: m
geospatial_vertical_positive: up
time_coverage_start: 1985
time_coverage_end: 2050
experiment: highres-future