1

Background

I am attempting to slice a NetCDF file using a bounding box of lat/lons. The relevant information of this file is listed below (variables, shape, dimensions):

enter image description here

Per most answers here and standard tutorials, this should be very straightforward, and my interpretation is that you just find the indices of the lat/lons and slice the variable array by those indices.

Attempt/Code

 def netcdf_worker(nc_file, bbox):
    dataset = Dataset(nc_file)
    for variable in dataset.variables.keys():
        if (variable != 'lat') and (variable != 'lon'):
            var_name = variable
            break

    # Full extent of data
    lats = dataset.variables['lat'][:]
    lons = dataset.variables['lon'][:]

    if bbox:
        lat_bnds = [bbox[0], bbox[2]]  # min lat, max lat
        lon_bnds = [bbox[1], bbox[3]]  # min lon, max lon
        lat_inds = np.where((lats > lat_bnds[0]) & (lats < lat_bnds[1]))
        lon_inds = np.where((lons > lon_bnds[0]) & (lons < lon_bnds[1]))

        var_subset = dataset.variables[var_name][:, lat_inds[0], lon_inds[0]]

        # would also be great to slice the lats and lons too for visualization

Problem

When attempting to implement the solutions found in other answers listed on SO via the above code, I am met with the error:

File "/Users/XXXXXX/Desktop/Viewer/viewer.py", line 41, in netcdf_worker
    var_subset = dataset.variables[var_name][:, lat_inds[0], lon_inds[0]]
  File "netCDF4/_netCDF4.pyx", line 4095, in netCDF4._netCDF4.Variable.__getitem__
  File "/Users/XXXXXX/Viewer/lib/python3.6/site-packages/netCDF4/utils.py", line 242, in _StartCountStride
    ea = np.where(ea < 0, ea + shape[i], ea)
IndexError: tuple index out of range

I believe there is something minor I am missing/not understanding about slicing multidimensional arrays and would appreciate any help. I am not interested in any solutions that bring any other packages or operate external to python (no CDO or NCKS answers please!). Thank you for your help.

1 Answer 1

6

In Python, I think that the easiest solution is to use xarray. Minimal example (using some ERA5 data):

import xarray as xr

f = xr.open_dataset('model_fc.nc')

print(f['latitude'].values)  # [52.771 52.471 52.171 51.871 51.571 51.271 50.971]
print(f['longitude'].values) # [3.927 4.227 4.527 4.827 5.127 5.427 5.727]

f2 = f.sel(longitude=slice(4.5, 5.4), latitude=slice(52.45, 51.5))  

print(f2['latitude'].values)  # [52.171 51.871 51.571]
print(f2['longitude'].values) # [4.527 4.827 5.127]

As an example I'm only showing the latitude and longitude variables, but all variables in the NetCDF file which have latitude and longitude dimensions are sliced automatically.


Alternatively, if you want to select the box manually (using NetCDF4):

import netCDF4 as nc4
import numpy as np

f = nc4.Dataset('model_fc.nc')

lat = f.variables['latitude'][:]
lon = f.variables['longitude'][:]

# All indices in bounding box:
where_j = np.where((lon >= 4.5) & (lon <= 5.4))[0]
where_i = np.where((lat >= 51.5) & (lat <= 52.45))[0]

# Start and end+1 indices in each dimension:
i0 = where_i[0]
i1 = where_i[-1]+1

j0 = where_j[0]
j1 = where_j[-1]+1

print(lat[i0:i1])  # [52.171 51.871 51.571]
print(lon[j0:j1])  # [4.527 4.827 5.127]

Now of course you have to slice each data array manually, using e.g. var_slice = var[j0:j1, i0:i1]

Sign up to request clarification or add additional context in comments.

1 Comment

Just as a reference (in case it helps someone else), CDO can do these operation with just one command: cdo sellonlatbox,4.5,5.4,51.5,52.45 model_fc.nc model_fc_slice.nc

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.