I've noticed some unexpected behavior with the interpolate_by expression and I'm not sure what is going on.
import polars as pl
df = pl.DataFrame({
'a': [1, 2, 3, 4, 5],
'b': [4, 5, None, 7, 8]
})
df = df.with_columns(interpolate = pl.col('b').interpolate_by('a'))
print(df)
results in this:
┌─────┬──────┬─────────────┐
│ a ┆ b ┆ interpolate │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 │
╞═════╪══════╪═════════════╡
│ 1 ┆ 4 ┆ 4.0 │
│ 2 ┆ 5 ┆ 5.0 │
│ 3 ┆ null ┆ 6.0 │
│ 4 ┆ 7 ┆ 7.0 │
│ 5 ┆ 8 ┆ 8.0 │
└─────┴──────┴─────────────┘
which is correct. However this:
df = pl.DataFrame({
'a': [1, 2, 3, 4, 5],
'b': [4, 5, 6, 7, None]
})
df = df.with_columns(interpolate = pl.col('b').interpolate_by('a'))
print(df)
results in this:
shape: (5, 3)
┌─────┬──────┬─────────────┐
│ a ┆ b ┆ interpolate │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 │
╞═════╪══════╪═════════════╡
│ 1 ┆ 4 ┆ 4.0 │
│ 2 ┆ 5 ┆ 5.0 │
│ 3 ┆ 6 ┆ 6.0 │
│ 4 ┆ 7 ┆ 7.0 │
│ 5 ┆ null ┆ null │
└─────┴──────┴─────────────┘
which is not correct. There is still plenty of data to perform a linear interpolation on column B using the data in column A. Am I missing something here and don't understand how this is supposed to work or is this a bug?