One possible solution to this is to open the hdf5-file using netCDF4 in diskless non-persistence mode:
ncf = netCDF4.Dataset(hdf5file, diskless=True, persist=False)
Now you can inspect the file contents including groups.
After that you can make use of xarray.backends.NetCDF4DataStore to open the wanted hdf5-groups (xarray can only get hold of one hdf5-group at a time):
nch = ncf.groups.get('hdf5-name')
xds = xarray.open_dataset(xarray.backends.NetCDF4DataStore(nch))
This will give you a dataset xds with all attributes and variables (datasets) of the
group hdf5-name. Note that you will not get access to sub-groups. You would need to claim subgroups by the same mechanism. If you want to apply dask, you would need to add the keyword chunking with wanted values.
There is no (real) automatism for decoding data like this could be done for NetCDF files. If you have a integer compressed 2d variable (dataset) var with some attributes gain and offset you can add the NetCDF specific attributes scale_factor and add_offset to the variable:
var = xds['var']
var.attrs['scale_factor'] = var.attrs.get('gain')
var.attrs['add_offset'] = var.attrs.get('offset')
ds = xarray.decode_cf(xds)
This will decode your variable using netcdf mechanisms.
Additionally you could try to give the extracted dimension useful names (you will get something like phony_dim_0, phony_dim_1, ..., phony_dim_N) and assign new (as in example) or existing variables/coordinates to those dimensions to gain as much of the xarray machinery:
var = xds['var']
var.attrs['scale_factor'] = var.attrs.get('gain')
var.attrs['add_offset'] = var.attrs.get('offset')
dims = var.dims
xds['var'] = var.rename({dims[0]: 'x', dims[1]: 'y'})
xds = xds.assign({'x': (['x'], xvals, xattrs)})
xds = xds.assign({'y': (['y'], yvals, yattrs)})
ds = xarray.decode_cf(xds)
References:
xarraymodule, However,h5pyaccesses HDF5 data asnumpyrecord arrays. So, you simply need to access a hdf5 dataset as a record array and manipulate the data into a xarray dataset format.