An easy logic is to do a runsum on the binary field using the rolling sum function with a window size equal to your event length N, identify the Ns with a logical expression and then just add those up. Thus you can do it in these two lines:
event_len=3 # this is N
ds_sum = ds.rolling(time=event_len, center=False).sum()
events = (ds_sum == event_len).astype(int).sum(dim="time")
If this logic is unclear, I actually have a youtube video posted on this on my channel for further clarification.
However, as pointed out in the comment by @Patrick, while this works fine for continuous data, it has a minor error if your data is only for specific months, e.g. July, as you may get a heatwave that consists of the last day of July in year X and then the first two days of July in year X+1 for example!!! To avoid this, we want to effectively turn the 3D [lon,lat,time] array into a 4D [lon,lat,year,day] array and apply the rolling function on the last "day" time dimension for each year separately. Luckily for us, there is the xarray function groupby which is ideal for this and we then just use apply like this.
events_per_year = ds['data'].groupby(ds.time.dt.year).apply(
lambda x: (x.rolling(time=event_len, center=False).sum() == event_len).astype(int).sum(dim="time")
)
I made some dummy July data for 4 years up to illustrate this for a small 3x3 grid (so this is reproducible after the OPs netcdf link disappears in the future):
import xarray as xr
import numpy as np
import pandas as pd
# make up some dummy July data for a few year on a small grid:
time = pd.date_range(start='2010-07-01', end='2013-07-31', freq='D')
time = time[time.month == 7]
lon = np.arange(0, 3, 1)
lat = np.arange(0, 3, 1)
data = np.random.randint(0, 2, size=(len(time), len(lat), len(lon)))
ds = xr.Dataset(
{"data": (("time", "lat", "lon"), data)},
coords={"time": time, "lon": lon, "lat": lat},)
### now we define events of this length:
event_len=3
# this function wraps up 3 steps:
# 1. Calculate the rolling sum for 'event_len' days for each year
# 2. Check where this rolling sum equals 'event_len' which I put 3 days here as per the OP def
# 3. Sum the true/false 1/0 in time dim
events_per_year = ds['data'].groupby(ds.time.dt.year).apply(
lambda x: (x.rolling(time=event_len, center=False).sum() == event_len).astype(int).sum(dim="time")
)
print(events_per_year)
This gives the following output
<xarray.DataArray (year: 4, lat: 3, lon: 3)>
array([[[3, 2, 2],
[0, 6, 8],
[3, 5, 4]],
[[8, 2, 8],
[4, 5, 6],
[7, 4, 4]],
[[9, 3, 1],
[6, 4, 3],
[2, 2, 7]],
[[1, 3, 3],
[2, 6, 8],
[3, 2, 0]]])
Coordinates:
* lat (lat) int64 0 1 2
* lon (lon) int64 0 1 2
* year (year) int64 2010 2011 2012 2013