I have a weight vector:
weight_vec = pl.Series("weights", [0.125, 0.0625, 0.03125])
And also a DataFrame containing up to m variables. For simplicity, we will only have two varaibles:
df = pl.DataFrame(
{
"row_index": [0, 1, 2, 3, 4],
"var1": [1, 2, 3, 4, 5],
"var2": [6, 7, 8, 9, 10],
}
)
The size (number of observations) for these variables can be very large (tens of millions of rows).
I would like to:
For each variable, and each observation x_i, where i is the row index [0,...,4], I want to transform the value of x_i to the sumproduct of all past
n's x_i value (including the current value [x_i,...x_i+n-1]), and the weight vector.nis the length of the given weight vector andnvaries for different weight vector definition.Numerically, the value of
var1at observation index 0 is the sumproduct of the values of all [x_0, x_1, x_2] and all the values of the weight vector. When the row index appraoches to and end (e.g., max index - row index + 1 <n) => all the values will be assigned None.We can assume that the height of the DataFrame is always larger or equal to the length of the weight vector to result in at least one valid result.
The resulting DataFrame should look like this:
shape: (5, 3)
┌───────────┬─────────┬─────────┐
│ row_index ┆ var1 ┆ var2 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═══════════╪═════════╪═════════╡
│ 0 ┆ 0.34375 ┆ 1.4375 │
│ 1 ┆ 0.5625 ┆ 1.65625 │
│ 2 ┆ 0.78125 ┆ 1.875 │
│ 3 ┆ null ┆ null │
│ 4 ┆ null ┆ null │
└───────────┴─────────┴─────────┘
Numeric Caldulations:
- x_0_var1: (0.125 * 1 + 0.0625 * 2 + 0.03125 * 3 = 0.34375)
- x_2_var2: (0.125 * 8 + 0.0625 * 9 + 0.03125 * 10 = 1.875)
I am looking for a memory efficient, vectorized Polars operation to achieve such results.
(1 * 0.125) + (2 * 0.0625) + (3 * 0.03125) = 0.34375. I've tried various ways (product then sum, sum then product), but am not able to reach your expected output.[1, 2, 3, 4, 5]- first uses[1,2,3](var1[0:0+3]), second uses[2,3,4](var1[1:1+3]), third uses[3,4,5](var1[2:2+3]). But you are right OP should better describe it - and OP show more calcuations.(np.array([0.125, 0.0625, 0.03125]) * [1, 2, 3]).sum()or more like(np.array(weight_vec) * row[row_index:row_index+3]).sum()