4

I'm calculating EWMA values for array of streamflow, and code is like below:

import polars as pl
import numpy as np

streamflow_data = np.arange(0, 20, 1)
adaptive_alphas = np.concatenate([np.repeat(0.3, 10), np.repeat(0.6, 10)])
streamflow_series = pl.Series(streamflow_data)
ewma_data = np.zeros_like(streamflow_data)
for i in range(1, len(streamflow_series)):
    current_alpha = adaptive_alphas[i]
    ewma_data[i] = streamflow_series[:i+1].ewm_mean(alpha=current_alpha)[-1]
# When set dtype of ewma_data to float when initial it, output is like this
Output: [0  0.58823529  1.23287671  1.93051717  2.67678771  3.46668163,  4.29488309  5.1560635   6.04512113  6.95735309  9.33379473 10.33353466, 11.33342058 12.33337091 13.33334944 14.33334021 15.33333625 16.33333457, 17.33333386 18.33333355]

# When I don't point dtype of ewma_data and dtype of streamflow_data is int, output will be floored
Output: [0  0  1  1  2  3  4  5  6  6  9 10 11 12 13 14 15 16 17 18]

But when length of streamflow_data is very big (such as >100000), this code will become very slow.

So how can I extinguish for in my code and don't influence its result?

Hope for your reply.

5
  • Are the adaptive alphas actually adaptive or are they constant as in the example? Commented Jan 14 at 18:28
  • alphas will switch between two values such as 0.6 and 0.3. Commented Jan 14 at 19:20
  • @forestbat would be nice to have an example with different alpha values and proper not-rounded output, so people can see if their approach works Commented Jan 15 at 9:58
  • I have updated it. Commented Jan 15 at 10:07
  • 1
    if you change part of your code to np.zeros_like(streamflow_data, dtype=float) then you'll have float results in ewma_data - and then results are the same as in my code Commented Jan 15 at 10:20

4 Answers 4

3

If you have only few alpha values and/or have some condition on which alpha should be used for which row, you could use pl.coalesce(), pl.when() and pl.Expr.ewm_mean():

df = pl.DataFrame({
    "adaptive_alpha": np.concatenate([np.repeat(0.3, 10), np.repeat(0.6, 10)]),
    "streamflow": np.arange(0, 20, 1)
})

df.with_columns(
    pl.coalesce(
        pl.when(pl.col.adaptive_alpha == alpha)
        .then(pl.col.streamflow.ewm_mean(alpha = alpha))
        for alpha in df["adaptive_alpha"].unique()
    ).alias("ewma")
).with_columns(ewma_int = pl.col.ewma.cast(pl.Int32))
shape: (20, 4)
┌────────────────┬────────────┬───────────┬──────────┐
│ adaptive_alpha ┆ streamflow ┆ ewma      ┆ ewma_int │
│ ---            ┆ ---        ┆ ---       ┆ ---      │
│ f64            ┆ i64        ┆ f64       ┆ i32      │
╞════════════════╪════════════╪═══════════╪══════════╡
│ 0.3            ┆ 0          ┆ 0.0       ┆ 0        │
│ 0.3            ┆ 1          ┆ 0.588235  ┆ 0        │
│ 0.3            ┆ 2          ┆ 1.232877  ┆ 1        │
│ 0.3            ┆ 3          ┆ 1.930517  ┆ 1        │
│ 0.3            ┆ 4          ┆ 2.676788  ┆ 2        │
│ …              ┆ …          ┆ …         ┆ …        │
│ 0.6            ┆ 15         ┆ 14.33334  ┆ 14       │
│ 0.6            ┆ 16         ┆ 15.333336 ┆ 15       │
│ 0.6            ┆ 17         ┆ 16.333335 ┆ 16       │
│ 0.6            ┆ 18         ┆ 17.333334 ┆ 17       │
│ 0.6            ┆ 19         ┆ 18.333334 ┆ 18       │
└────────────────┴────────────┴───────────┴──────────┘
Sign up to request clarification or add additional context in comments.

3 Comments

Thank for your reply, but it's calculating two sequences independently rather than iterating on an array. Your algorithm's result is [ 0, 0.58, ……, 17.33, 18.33], but answer in origin algorithm is [0 0 1 1 2 3 4 5 6 6 9 10 11 12 13 14 15 16 17 18]. I have answer of myself now, and I will put it under this question.
@forestbat there's rounding (or rather flooring) happening your example - is that what you want?
It only happens in this demo, but in actual program, there is no rounding.
0
ewma_data = np.zeros_like(streamflow_data)
alpha_high, alpha_low = 0.6, 0.3
ewma_high = streamflow_series.ewm_mean(alpha=alpha_high)
ewma_low = streamflow_series.ewm_mean(alpha=alpha_low)
ewma_high_index = np.argwhere(adaptive_alphas == alpha_high)
ewma_low_index = np.argwhere(adaptive_alphas == alpha_low)
alpha_zones_high = find_continuous_intervals_vectorized(np.concatenate(ewma_high_index)) if len(ewma_high_index>0) else []
alpha_zones_low = find_continuous_intervals_vectorized(np.concatenate(ewma_low_index)) if len(ewma_low_index>0) else []
alpha_zones = alpha_zones_high + alpha_zones_low
# np.array_equal(streamflow_series.ewm_mean(alpha=current_alpha)[:i+1].to_numpy(), streamflow_series[:i+1].ewm_mean(alpha=current_alpha).to_numpy()) = True
for a_zone in alpha_zones:
    ewma_all = ewma_high if adaptive_alphas[a_zone[0]] == alpha_high else ewma_low
    ewma_data[a_zone[0]: a_zone[-1]+1] = ewma_all[a_zone[0]: a_zone[-1]+1]
def find_continuous_intervals_vectorized(arr):
    if len(arr) == 0:
        return []
    diffs = np.diff(arr)
    boundaries = np.where(diffs != 1)[0]
    boundaries = np.concatenate(([-1], boundaries, [len(arr) - 1]))
    intervals = np.split(arr, boundaries + 1)
    intervals = [interval for interval in intervals if len(interval) > 1]
    return intervals

Comments

0

There are a couple of issues in the code given:

  • The code is doing a lot of computations already made in the previous loop indexes. One correct algorithm would be the one you got from DeepSeek. But the devil is in the details.

  • You get integers instead of float because np.zeros_like(X) will take the same type as X so integers here and it is not what you want when you compute exponential moving average. So should use:

    ewma_data = np.zeros_like(streamflow_data, dtype='float32')

  • In the polars documentation for ewm_mean one can see that there are several options to compute the EWM (see adjust parameter): polar doc I don't know which one you want but notice that by default adjust=True. If you use adjust=False in your code (and use float as above) you will get the same results as DeepSeek.

Comments

-1

You can calculate the EWMA iteratively using a single pass over the data without recalculating intermediate values

import numpy as np

streamflow_data = np.arange(0, 20, 1)
adaptive_alphas = np.repeat(0.3, len(streamflow_data))

# Initialize the EWMA array
ewma_data = np.zeros_like(streamflow_data, dtype=float)

# Set the initial value
ewma_data[0] = streamflow_data[0]

# Compute EWMA in a vectorized way
for i in range(1, len(streamflow_data)):
    ewma_data[i] = (
        adaptive_alphas[i] * streamflow_data[i] + (1 - adaptive_alphas[i]) * ewma_data[i - 1]
    )

print(ewma_data)

I didn't try the code but you can play around with it. also you can always use this library TA-Lib

1 Comment

Deepseek ai also gives me this result but it's wrong. I will see your recommend later.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.