5

I'm trying to take an array and resample it with a custom function. From this post: Apply function along time dimension of XArray

def special_mean(x, drop_min=False):
    s = np.sum(x)
    n = len(x)
    if drop_min:
        s = s - x.min()
    n -= 1
    return s/n

is an example sample_mean.

I have a dataset that is:

<xarray.Dataset>
Dimensions:  (lat: 100, lon: 130, time: 7305)
Coordinates:
  * lon      (lon) float32 -99.375 -99.291664 -99.208336 ... -88.708336 -88.625
  * lat      (lat) float32 49.78038 49.696426 49.61247 ... 41.552795 41.46884
    lev      float32 1.0
  * time     (time) datetime64[ns] 2040-01-01 2040-01-02 ... 2059-12-31
Data variables:
    tmin     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    tmax     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    prec     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    relh     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    wspd     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
    rads     (time, lat, lon) float32 dask.array<chunksize=(366, 100, 130), meta=np.ndarray>
Attributes:
    history:  Fri Jun 14 10:32:22 2019: ncatted -a _FillValue,,o,d,9e+20 IBIS...

And then I apply a resample that is:

data.resample(time='1MS').map(special_mean)


<xarray.Dataset>
Dimensions:  (time: 240)
Coordinates:
  * time     (time) datetime64[ns] 2040-01-01 2040-02-01 ... 2059-12-01
    lev      float32 1.0
Data variables:
    tmin     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    tmax     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    prec     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    relh     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    wspd     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    rads     (time) float32 dask.array<chunksize=(1,), meta=np.ndarray>

How do I do this function such that I can retain the 'lon' and 'lat' coordinates like when doing

data.resample(time='1MS').mean()

2 Answers 2

5

Here's one example of how you can use xr.apply_ufunc().

import xarray as xr
data = xr.tutorial.open_dataset('air_temperature')

def special_mean(x, drop_min=False):
    s = np.sum(x)
    n = len(x)
    if drop_min:
        s = s - x.min()
    n -= 1
    return s/n

def special_func(data):
    return xr.apply_ufunc(special_mean, data, input_core_dims=[["time"]], 
            kwargs={'drop_min': True}, dask = 'allowed', vectorize = True)

data.resample(time='1MS').apply(special_func)

<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 24)
Coordinates:
  * time     (time) datetime64[ns] 2013-01-01 2013-02-01 ... 2014-12-01
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
Data variables:
    air      (time, lat, lon) float64 244.6 244.7 244.7 ... 297.7 297.7 297.7
Sign up to request clarification or add additional context in comments.

3 Comments

As shown in the documentation, vectorize=True is very slow for my size dataset. Do you have any idea how I can make it a little bit quicker?
Also, is there a way to pass in variables to special_func that are out of scope. The use case would be drop_min only 1 versus drop_min twice or something like that.
@blueduckyy try converting your data into a dask array using .chunk() and then switch the argument to dask='parallelized' :) It'll allow the ufunc to operate lazily on your data using dask, you can then load the data into memory later on using da.compute(). Have a look at my other answer, see if it helps. :) stackoverflow.com/questions/38960903/…
-1

I suspect that you can do what you want with the apply_ufunc method.

(although as a disclaimer I do not know the Xarray API well.)

1 Comment

How? I've tried using ufunc and i'm not sure if I just don't know how it works? Where does it go?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.