Setting slice of column to list of values on polars dataframe

Question

In the code below I'm creating a polars- and a pandas dataframe with identical data. I want to select a set of rows based on a condition on column A, then update the corresponding rows for column C. I've included how I would do this with the pandas dataframe, but I'm coming up short on how to get this working with polars. The closest I've gotten is by using when-then-otherwise, but I'm unable to use anything other than a single value in then.

import pandas as pd
import polars as pl

df_pd = pd.DataFrame({'A': ['x', 'x', 'x', 'x', 'y', 'y', 'y', 'y'],
                      'B': [1, 1, 2, 2, 1, 1, 2, 2],
                      'C': [1, 2, 3, 4, 5, 6, 7, 8]})

df_pl = pl.DataFrame({'A': ['x', 'x', 'x', 'x', 'y', 'y', 'y', 'y'],
                      'B': [1, 1, 2, 2, 1, 1, 2, 2],
                      'C': [1, 2, 3, 4, 5, 6, 7, 8]})

df_pd.loc[df_pd['A'] == 'x', 'C'] = [-1, -2, -3, -4]

df_pl ???

Expected output:

┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ x   ┆ 1   ┆ -1  │
│ x   ┆ 1   ┆ -2  │
│ x   ┆ 2   ┆ -3  │
│ x   ┆ 2   ┆ -4  │
│ y   ┆ 1   ┆ 5   │
│ y   ┆ 1   ┆ 6   │
│ y   ┆ 2   ┆ 7   │
│ y   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

show what you expect the subsequent result of this operation to be ( manually type that if you can't code it) , — ticktalk
– ticktalk, Commented Dec 16, 2024 at 23:51

Hericks · Accepted Answer · 2024-12-19 13:47:51Z

Actually, in-place updates similar to pandas are supported in polars. Especially, the following works as expected.

df_pl[[0, 1, 2, 3], "C"] = [-1, -2, -3, -4]

Instead of a list of indices, a dataframe with single integer column may also be passed. Especially, we can do the following.

idx = df_pl.with_row_index().filter(pl.col("A") == "x").select("index")
df_pl[idx, "C"] = [-1, -2, -3, -4]

shape: (8, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ x   ┆ 1   ┆ -1  │
│ x   ┆ 1   ┆ -2  │
│ x   ┆ 2   ┆ -3  │
│ x   ┆ 2   ┆ -4  │
│ y   ┆ 1   ┆ 5   │
│ y   ┆ 1   ┆ 6   │
│ y   ┆ 2   ┆ 7   │
│ y   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

See this answer for an alternative solution using pl.DataFrame.update.

jqurious · Accepted Answer · 2025-07-10 11:59:21Z

If you wrap the values in a pl.lit Series, you can index the values with Expr.get

values = pl.lit(pl.Series([-1, -2, -3, -4]))
idxs = pl.when(pl.col.A == 'x').then(1).cum_sum() - 1

df.with_columns(C = pl.coalesce(values.get(idxs), 'C'))

shape: (8, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ x   ┆ 1   ┆ -1  │
│ x   ┆ 1   ┆ -2  │
│ x   ┆ 2   ┆ -3  │
│ x   ┆ 2   ┆ -4  │
│ y   ┆ 1   ┆ 5   │
│ y   ┆ 1   ┆ 6   │
│ y   ┆ 2   ┆ 7   │
│ y   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

These are the steps expanded.

The indices are created, used to .get() and .coalesce() combines in the values from the other column.

df.with_columns(
    idxs = idxs,
    values = values.get(idxs),
    D = pl.coalesce(values.get(idxs), 'C')
)

shape: (8, 6)
┌─────┬─────┬─────┬──────┬────────┬─────┐
│ A   ┆ B   ┆ C   ┆ idxs ┆ values ┆ D   │
│ --- ┆ --- ┆ --- ┆ ---  ┆ ---    ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i32  ┆ i64    ┆ i64 │
╞═════╪═════╪═════╪══════╪════════╪═════╡
│ x   ┆ 1   ┆ 1   ┆ 0    ┆ -1     ┆ -1  │
│ x   ┆ 1   ┆ 2   ┆ 1    ┆ -2     ┆ -2  │
│ x   ┆ 2   ┆ 3   ┆ 2    ┆ -3     ┆ -3  │
│ x   ┆ 2   ┆ 4   ┆ 3    ┆ -4     ┆ -4  │
│ y   ┆ 1   ┆ 5   ┆ null ┆ null   ┆ 5   │
│ y   ┆ 1   ┆ 6   ┆ null ┆ null   ┆ 6   │
│ y   ┆ 2   ┆ 7   ┆ null ┆ null   ┆ 7   │
│ y   ┆ 2   ┆ 8   ┆ null ┆ null   ┆ 8   │
└─────┴─────┴─────┴──────┴────────┴─────┘

Another option is to get the row index of each True, e.g. using pl.arg_where()

You can then add a row index and .replace_strict() in the new values.

df.with_columns(
    pl.int_range(pl.len()).replace_strict(
        pl.arg_where(pl.col.A == "x"),
        [-1, -2, -3, -4],
        default = pl.col.C
    )
    .alias("C")
)

shape: (8, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ x   ┆ 1   ┆ -1  │
│ x   ┆ 1   ┆ -2  │
│ x   ┆ 2   ┆ -3  │
│ x   ┆ 2   ┆ -4  │
│ y   ┆ 1   ┆ 5   │
│ y   ┆ 1   ┆ 6   │
│ y   ┆ 2   ┆ 7   │
│ y   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

Henry Harbeck · Accepted Answer · 2024-12-21 11:10:00Z

~~In Polars, there is not really a notion of assigning to a slice of a DataFrame.~~

Edit: the above statement was incorrect. See the answer by @Hericks for how this can be achieved. Do note thought that doing so is not considered idiomatic in Polars.

Also, in when/then/otherwise, Polars expects lengths of everything to be compatible. They have to all be the same length, or be scalars that are then broadcasted.

With those things in mind, here are a few options:

Given you know that there are 4 values "x" in column A, you can split the df, update the column and concat the result back together. This works regardless of which rows the 4 "x" values are in.

pl.concat([
  df_pl.filter(pl.col("A") == "x").with_columns(C=pl.Series([-1, -2, -3, -4])),
  df_pl.filter(pl.col("A") != "x"),
])

If you also know that the "x" rows are the first 4 rows, you can pad the new values with nulls and then use when/then/otherwise or coalesce. This only works when you know they are the first 4 rows.

new_values = [-1, -2, -3, -4]
new_c = pl.Series(new_values).extend_constant(None, df_pl.height - len(new_values))
df_pl.with_columns(C=pl.coalesce(new_c, "C"))

On your example data, both of the above snippets output

shape: (8, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ x   ┆ 1   ┆ -1  │
│ x   ┆ 1   ┆ -2  │
│ x   ┆ 2   ┆ -3  │
│ x   ┆ 2   ┆ -4  │
│ y   ┆ 1   ┆ 5   │
│ y   ┆ 1   ┆ 6   │
│ y   ┆ 2   ┆ 7   │
│ y   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

Note to anyone else reading this answer that if you are only needing to assign a scalar (literal value) or have a new list the same length as the DataFrame, just use a plain when/then/otherwise as outlined here in the user guide and here in the docs instead of the suggestions above.

roman · Accepted Answer · 2024-12-17 08:52:41Z

If you don't know position of your x values, then you could generate "row index" on the fly and use it. For example, with pl.DataFrame.update():

new_values = [-1, -2, -3, -4]

(
    df_pl
    .with_columns(index = pl.int_range(pl.len()).over("A").cast(pl.UInt32))
    .update(
        pl.DataFrame({"A": "x", "C": new_values}).with_row_index(),
        on=["A","index"],
        how="left"
    )
    .drop("index")
)

shape: (8, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ x   ┆ 1   ┆ -1  │
│ x   ┆ 1   ┆ -2  │
│ x   ┆ 2   ┆ -3  │
│ x   ┆ 2   ┆ -4  │
│ y   ┆ 1   ┆ 5   │
│ y   ┆ 1   ┆ 6   │
│ y   ┆ 2   ┆ 7   │
│ y   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

Or something like

(
    df_pl.with_row_index()
    .update(
        pl.DataFrame({
            "C": pl.Series(new_values),
            "index": df_pl.select((pl.col.A == "x").arg_true())
        }),
        on=["index"],
        how="left"
    )
    .drop("index")
)

shape: (8, 3)
┌─────┬─────┬─────┐
│ A   ┆ B   ┆ C   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ x   ┆ 1   ┆ -1  │
│ x   ┆ 1   ┆ -2  │
│ x   ┆ 2   ┆ -3  │
│ x   ┆ 2   ┆ -4  │
│ y   ┆ 1   ┆ 5   │
│ y   ┆ 1   ┆ 6   │
│ y   ┆ 2   ┆ 7   │
│ y   ┆ 2   ┆ 8   │
└─────┴─────┴─────┘

If you know that x rows positioned at the beginning of the DataFrame, you can do:

df_pl.with_columns(
    pl.Series(new_values)
    .append(df_pl["C"].tail(-len(new_values)))
    .alias("C")
)

And, if x might not be in the front of the DataFrame, but you don't care about original order of the rows, you can sort it first:

(
    df_pl.sort(
        pl.when(pl.col.A == "x").then(0).otherwise(1),
        maintain_order = True
    )
    .with_columns(
        pl.Series(new_values)
        .append(df_pl["C"].tail(-len(new_values)))
        .alias("C")
    )
)

Collectives™ on Stack Overflow

Setting slice of column to list of values on polars dataframe

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related