2

I have this dataframe:

import polars as pl

df = pl.DataFrame({'value': [1,2,3,4,5,None,None], 'flag': [0,1,1,1,0,0,0]})
┌───────┬──────┐
│ value ┆ flag │
│   --- ┆  --- │
│   i64 ┆  i64 │
╞═══════╪══════╡
│     1 ┆    0 │
│     2 ┆    1 │
│     3 ┆    1 │
│     4 ┆    1 │
│     5 ┆    0 │
│  null ┆    0 │
│  null ┆    0 │
└───────┴──────┘

I want to use df.with_columns(pl.col('value').forward_fill()) (or similar), but I only want to use values that have flag == 1 to be eligible for filling. So in this example, I want value 4 to be used to replace the two null entries (rather than 5).

┌───────┬──────┐
│ value ┆ flag │
│ ---   ┆ ---  │
│ i64   ┆ i64  │
╞═══════╪══════╡
│ 1     ┆ 0    │
│ 2     ┆ 1    │
│ 3     ┆ 1    │
│ 4     ┆ 1    │
│ 5     ┆ 0    │
│ 4     ┆ 0    │
│ 4     ┆ 0    │
└───────┴──────┘

How can one achieve this?

3 Answers 3

4

You can mask the original column with the flag, forward fill that, and then coalesce the original column with the new column. In the example below I've replaced 2 with None to show that values whose flag is 0 aren't eligible for forward filling.

import polars as pl

df = pl.DataFrame(
    {"value": [1, None, 3, 4, 5, None, None], "flag": [0, 1, 1, 1, 0, 0, 0]}
)

df.with_columns(
    pl.coalesce(
        "value",
        pl.when(pl.col("flag") == 1)
        .then(pl.col("value"))
        .forward_fill(),
    ).alias("ffill")
)
shape: (7, 3)
┌───────┬──────┬───────┐
│ value ┆ flag ┆ ffill │
│ ---   ┆ ---  ┆ ---   │
│ i64   ┆ i64  ┆ i64   │
╞═══════╪══════╪═══════╡
│ 1     ┆ 0    ┆ 1     │
│ null  ┆ 1    ┆ null  │
│ 3     ┆ 1    ┆ 3     │
│ 4     ┆ 1    ┆ 4     │
│ 5     ┆ 0    ┆ 5     │
│ null  ┆ 0    ┆ 4     │
│ null  ┆ 0    ┆ 4     │
└───────┴──────┴───────┘
Sign up to request clarification or add additional context in comments.

1 Comment

You can omit the otherwise(None) and OP specified that flag == 1 (rather than flag != 0) is the eligibility for filling
2

You can create an expression where you replace all values of flag with 1 when value is null (let's call it flag_group). Then apply .forward_fill over each flag_group to ensure that only values with flag = 1 are eligible for filling the null values.

import polars as pl

df = pl.DataFrame({"value": [1, 2, 3, 4, 5, None, None], "flag": [0, 1, 1, 1, 0, 0, 0]})

flag_group = pl.when(pl.col("value").is_null()).then(1).otherwise(pl.col("flag"))

res = df.with_columns(value_ffill=pl.col("value").forward_fill().over(flag_group))

print(res)

Output:

shape: (7, 3)
┌───────┬──────┬─────────────┐
│ value ┆ flag ┆ value_ffill │
│ ---   ┆ ---  ┆ ---         │
│ i64   ┆ i64  ┆ i64         │
╞═══════╪══════╪═════════════╡
│ 1     ┆ 0    ┆ 1           │
│ 2     ┆ 1    ┆ 2           │
│ 3     ┆ 1    ┆ 3           │
│ 4     ┆ 1    ┆ 4           │
│ 5     ┆ 0    ┆ 5           │
│ null  ┆ 0    ┆ 4           │
│ null  ┆ 0    ┆ 4           │
└───────┴──────┴─────────────┘

Comments

1

I misunderstood it at first, assuming you wanted a literal 4 instead of keeping the last flagged value.

Updated:

# Mask out non-flagged values to ignore them
eligible = pl.col("flag") == 1
masked_values = pl.when(eligible).then(pl.col("value"))  # implicit otherwise(None)

# Keep the last non-null flagged value
ffill = masked_values.forward_fill()

# Call fill_null again to keep the original value when it's not null, regardless of the flag
filled = pl.col("value").fill_null(ffill)
# (Equivalent to this but more concise)
filled = pl.when(pl.col("value").is_null()).then(ffill).otherwise(pl.col("value"))

df.with_columns(filled)

Original answer

You could use when().then().otherwise() for that, just

ffill = pl.col('value').fill_null(strategy="forward")
literal = pl.col('value').fill_null(4)

df.with_columns(pl.when(pl.col("flag") == 1).then(ffill).otherwise(literal))

However, note that might fill values though 'gaps' in the flags, for example

df = pl.DataFrame({"value": [1, None, None], "flag": [1, 0, 1]})
>>> df.with_columns(pl.when(pl.col("flag") == 1).then(ffill).otherwise(literal))
shape: (3, 2)
┌───────┬──────┐
│ value ┆ flag │
│ ---   ┆ ---  │
│ i64   ┆ i64  │
╞═══════╪══════╡
│ 1     ┆ 1    │
│ 4     ┆ 0    │
│ 1     ┆ 1    │
└───────┴──────┘

If you want to avoid that, you can use .over() to avoid filling in beyond each 'group', for example

flag_groups = pl.col("flag").rle_id()
ffill_groups = ffill.over(flag_groups).fill_null(4)

df.with_columns(pl.when(pl.col("flag") == 1).then(ffill_groups).otherwise(literal))

Note that doing so will keep null values on the start of each group, so you may need to fill_null() again afterwards

2 Comments

I think there is an issue here, because you hardcoded value 4 in here fill_null(4), no?
In pl.col('value').fill_null(4), that is only used for the otherwise() case of when() In ffill.over(flag_groups).fill_null(4), that is used to fill in values still missing after the forward_fill().over(groups) (this being the "may need to fill_null() again afterwards" I say in the end) If that is giving incorrect results, please provide a more complete set of example inputs & outputs Edit: Oh, never mind, I misunderstood it. Gonna update the answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.