How can I initialize a Zarr file that is larger than available memory?

Question

My workflow generates a dataset of format xr.Dataset with dims (6, 36, 2, 13, 699, 1920) in float32.

I can process and write output array chunk by chunk, but only if the zarr file already exists, with:

ds.to_zarr('data.zarr', region=region)

Does anyone have an idea how to initialize a zarr file that is larger than available memory?

My libraries are:

zarr-python: '2.18.4'
xarray: '2025.1.2'

AMA · Accepted Answer · 2025-03-19 14:27:49Z

0

I was able to do with `dask.array`.

import dask.array as da
import numpy as np

coords = ...
dims = ...
var_name = 'value'
chunks = (1, 13, 36, 128, 128)
encoding = {var_name: {'chunks': chunks}}
store = 'test.zarr'

daskarray = da.empty(
    (6, 13, 36, 699, 1920),
    chunks=chunks,
    dtype='float32',
)
daskarray[:] = np.nan

xr.DataArray(
    daskarray,
    coords=coords,
    dims=dims,
).to_dataset(name=var_name).to_zarr(store, mode='w', encoding=encoding)

answered Mar 19 at 14:27

AMA

2261 gold badge5 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How can I initialize a Zarr file that is larger than available memory?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related