0

I have global daily radiation data for 19 years. It is divided into one netCDF file for every day (so around 7000 files). I am loading all the files together as a single xarray dataset. This takes some time but works fine because of Dask.

Then I clip this dataset to my region of interest and now I want to save this clipped xarray dataset to disk. I tried saving it as a single netCDF file but this file would be too big (around 100GB) and it will take hours to write it to disk.

Then I came across zarr and read that it can write to disk using parallel processing and hence is faster than writing to netCDF.Here is the code I am using to save the dataset as a zarr file:

zarr_store_path = r"T:\Europe_model\Data\Combined_datasets\BESS_PAR\BESS_PAR_combined.zarr"
PAR_clipped.to_zarr(zarr_store_path, mode='w', consolidated=True)

However, I get this error everytime:

module 'zarr' has no attribute 'open_group'

The versions I am using: Xarray: 2024.11.0 Zarr: 3.0.7

What am I doing wrong? How can I use Zarr to fasten the writing of a dataset to disk? Also, is there another way to store very large xarray datasets to disk faster than writing to netCDF?

1
  • I think I figured out the solution. The problem was how the dependencies were interacting with each other. The version of zarr was not entirelt compatible with the version of xarray I was using. I don"t know how it got fixed but I uninstalled and reinstalled some libraries a couple of times and then it finally worked. Commented May 13 at 22:52

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.