I have global daily radiation data for 19 years. It is divided into one netCDF file for every day (so around 7000 files). I am loading all the files together as a single xarray dataset. This takes some time but works fine because of Dask.
Then I clip this dataset to my region of interest and now I want to save this clipped xarray dataset to disk. I tried saving it as a single netCDF file but this file would be too big (around 100GB) and it will take hours to write it to disk.
Then I came across zarr and read that it can write to disk using parallel processing and hence is faster than writing to netCDF.Here is the code I am using to save the dataset as a zarr file:
zarr_store_path = r"T:\Europe_model\Data\Combined_datasets\BESS_PAR\BESS_PAR_combined.zarr"
PAR_clipped.to_zarr(zarr_store_path, mode='w', consolidated=True)
However, I get this error everytime:
module 'zarr' has no attribute 'open_group'
The versions I am using: Xarray: 2024.11.0 Zarr: 3.0.7
What am I doing wrong? How can I use Zarr to fasten the writing of a dataset to disk? Also, is there another way to store very large xarray datasets to disk faster than writing to netCDF?