How to extinguish cycle in my code when calculating EMWA？

Question

I'm calculating EWMA values for array of streamflow, and code is like below:

import polars as pl
import numpy as np

streamflow_data = np.arange(0, 20, 1)
adaptive_alphas = np.concatenate([np.repeat(0.3, 10), np.repeat(0.6, 10)])
streamflow_series = pl.Series(streamflow_data)
ewma_data = np.zeros_like(streamflow_data)
for i in range(1, len(streamflow_series)):
    current_alpha = adaptive_alphas[i]
    ewma_data[i] = streamflow_series[:i+1].ewm_mean(alpha=current_alpha)[-1]

# When set dtype of ewma_data to float when initial it, output is like this
Output: [0  0.58823529  1.23287671  1.93051717  2.67678771  3.46668163,  4.29488309  5.1560635   6.04512113  6.95735309  9.33379473 10.33353466, 11.33342058 12.33337091 13.33334944 14.33334021 15.33333625 16.33333457, 17.33333386 18.33333355]

# When I don't point dtype of ewma_data and dtype of streamflow_data is int, output will be floored
Output: [0  0  1  1  2  3  4  5  6  6  9 10 11 12 13 14 15 16 17 18]

But when length of streamflow_data is very big (such as >100000), this code will become very slow.

So how can I extinguish for in my code and don't influence its result?

Hope for your reply.

Are the adaptive alphas actually adaptive or are they constant as in the example? — Hericks
– Hericks, Commented Jan 14 at 18:28
@forestbat would be nice to have an example with different alpha values and proper not-rounded output, so people can see if their approach works — roman
– roman, Commented Jan 15 at 9:58
if you change part of your code to np.zeros_like(streamflow_data, dtype=float) then you'll have float results in ewma_data - and then results are the same as in my code — roman
– roman, Commented Jan 15 at 10:20

roman · Accepted Answer · 2025-01-15 11:06:40Z

3

If you have only few alpha values and/or have some condition on which alpha should be used for which row, you could use pl.coalesce(), pl.when() and pl.Expr.ewm_mean():

df = pl.DataFrame({
    "adaptive_alpha": np.concatenate([np.repeat(0.3, 10), np.repeat(0.6, 10)]),
    "streamflow": np.arange(0, 20, 1)
})

df.with_columns(
    pl.coalesce(
        pl.when(pl.col.adaptive_alpha == alpha)
        .then(pl.col.streamflow.ewm_mean(alpha = alpha))
        for alpha in df["adaptive_alpha"].unique()
    ).alias("ewma")
).with_columns(ewma_int = pl.col.ewma.cast(pl.Int32))

shape: (20, 4)
┌────────────────┬────────────┬───────────┬──────────┐
│ adaptive_alpha ┆ streamflow ┆ ewma      ┆ ewma_int │
│ ---            ┆ ---        ┆ ---       ┆ ---      │
│ f64            ┆ i64        ┆ f64       ┆ i32      │
╞════════════════╪════════════╪═══════════╪══════════╡
│ 0.3            ┆ 0          ┆ 0.0       ┆ 0        │
│ 0.3            ┆ 1          ┆ 0.588235  ┆ 0        │
│ 0.3            ┆ 2          ┆ 1.232877  ┆ 1        │
│ 0.3            ┆ 3          ┆ 1.930517  ┆ 1        │
│ 0.3            ┆ 4          ┆ 2.676788  ┆ 2        │
│ …              ┆ …          ┆ …         ┆ …        │
│ 0.6            ┆ 15         ┆ 14.33334  ┆ 14       │
│ 0.6            ┆ 16         ┆ 15.333336 ┆ 15       │
│ 0.6            ┆ 17         ┆ 16.333335 ┆ 16       │
│ 0.6            ┆ 18         ┆ 17.333334 ┆ 17       │
│ 0.6            ┆ 19         ┆ 18.333334 ┆ 18       │
└────────────────┴────────────┴───────────┴──────────┘

edited Jan 15 at 11:06

answered Jan 15 at 9:01

roman

118k30 gold badges205 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

forestbat Jan 15 at 9:22

Thank for your reply, but it's calculating two sequences independently rather than iterating on an array. Your algorithm's result is [ 0, 0.58, ……, 17.33, 18.33], but answer in origin algorithm is [0 0 1 1 2 3 4 5 6 6 9 10 11 12 13 14 15 16 17 18]. I have answer of myself now, and I will put it under this question.

roman Jan 15 at 9:46

@forestbat there's rounding (or rather flooring) happening your example - is that what you want?

forestbat Jan 15 at 9:56

It only happens in this demo, but in actual program, there is no rounding.

forestbat · Accepted Answer · 2025-01-15 09:28:55Z

ewma_data = np.zeros_like(streamflow_data)
alpha_high, alpha_low = 0.6, 0.3
ewma_high = streamflow_series.ewm_mean(alpha=alpha_high)
ewma_low = streamflow_series.ewm_mean(alpha=alpha_low)
ewma_high_index = np.argwhere(adaptive_alphas == alpha_high)
ewma_low_index = np.argwhere(adaptive_alphas == alpha_low)
alpha_zones_high = find_continuous_intervals_vectorized(np.concatenate(ewma_high_index)) if len(ewma_high_index>0) else []
alpha_zones_low = find_continuous_intervals_vectorized(np.concatenate(ewma_low_index)) if len(ewma_low_index>0) else []
alpha_zones = alpha_zones_high + alpha_zones_low
# np.array_equal(streamflow_series.ewm_mean(alpha=current_alpha)[:i+1].to_numpy(), streamflow_series[:i+1].ewm_mean(alpha=current_alpha).to_numpy()) = True
for a_zone in alpha_zones:
    ewma_all = ewma_high if adaptive_alphas[a_zone[0]] == alpha_high else ewma_low
    ewma_data[a_zone[0]: a_zone[-1]+1] = ewma_all[a_zone[0]: a_zone[-1]+1]

def find_continuous_intervals_vectorized(arr):
    if len(arr) == 0:
        return []
    diffs = np.diff(arr)
    boundaries = np.where(diffs != 1)[0]
    boundaries = np.concatenate(([-1], boundaries, [len(arr) - 1]))
    intervals = np.split(arr, boundaries + 1)
    intervals = [interval for interval in intervals if len(interval) > 1]
    return intervals

rehaqds · Accepted Answer · 2025-01-15 09:46:08Z

There are a couple of issues in the code given:

The code is doing a lot of computations already made in the previous loop indexes. One correct algorithm would be the one you got from DeepSeek. But the devil is in the details.
You get integers instead of float because np.zeros_like(X) will take the same type as X so integers here and it is not what you want when you compute exponential moving average. So should use:

ewma_data = np.zeros_like(streamflow_data, dtype='float32')
In the polars documentation for ewm_mean one can see that there are several options to compute the EWM (see adjust parameter): I don't know which one you want but notice that by default adjust=True. If you use adjust=False in your code (and use float as above) you will get the same results as DeepSeek.

ricardo · Accepted Answer · 2025-01-14 18:01:31Z

-1

You can calculate the EWMA iteratively using a single pass over the data without recalculating intermediate values

import numpy as np

streamflow_data = np.arange(0, 20, 1)
adaptive_alphas = np.repeat(0.3, len(streamflow_data))

# Initialize the EWMA array
ewma_data = np.zeros_like(streamflow_data, dtype=float)

# Set the initial value
ewma_data[0] = streamflow_data[0]

# Compute EWMA in a vectorized way
for i in range(1, len(streamflow_data)):
    ewma_data[i] = (
        adaptive_alphas[i] * streamflow_data[i] + (1 - adaptive_alphas[i]) * ewma_data[i - 1]
    )

print(ewma_data)

I didn't try the code but you can play around with it. also you can always use this library TA-Lib

answered Jan 14 at 18:01

ricardo

216 bronze badges

1 Comment

forestbat Jan 14 at 18:11

Deepseek ai also gives me this result but it's wrong. I will see your recommend later.

Collectives™ on Stack Overflow

How to extinguish cycle in my code when calculating EMWA？

4 Answers 4

3 Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related