18

Is there an elegant way how to recode values in polars dataframe.

For example

1->0, 
2->0, 
3->1... 

in Pandas it is simple like that:

df.replace([1,2,3,4,97,98,99],[0,0,1,1,2,2,2])

6 Answers 6

30

Edit 2024-07-09

Polars has dedicated replace and replace_strict expressions.

df = pl.DataFrame({
    "a": [1, 2, 3, 4, 5]
})

mapper = {
    1: 0,
    2: 0,
    3: 10,
    4: 10
}

df.select(
    pl.all().replace(mapper)
)
shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
│ 0   │
│ 10  │
│ 10  │
│ 5   │
└─────┘

Before Edit

In polars you can build columnar if else statetements called if -> then -> otherwise expressions.

So let's say we have this DataFrame.

df = pl.DataFrame({
    "a": [1, 2, 3, 4, 5]
})

And we'd like to replace these with the following values:

from_ = [1, 2]
to_ = [99, 12]

We could write:

df.with_columns(
    pl.when(pl.col("a") == from_[0])
    .then(to_[0])
    .when(pl.col("a") == from_[1])
    .then(to_[1])
    .otherwise(pl.col("a")).alias("a")
)
shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 99  │
│ 12  │
│ 3   │
│ 4   │
│ 5   │
└─────┘

Don't repeat yourself

Now, this becomes very tedious to write really fast, so we could write a function that generates these expressions for use, we are programmers aren't we!

So to replace with the values you have suggested, you could do:

from_ = [1,2,3,4,97,98,99]
to_ = [0,0,1,1,2,2,2]


def replace(column, from_, to_):
    # initiate the expression with `pl.when`
    branch = pl.when(pl.col(column) == from_[0]).then(to_[0])

    
    # for every value add a `when.then`
    for (from_value, to_value) in zip(from_, to_):
        branch = branch.when(pl.col(column) == from_value).then(to_value)

    # finish with an `otherwise`
    return branch.otherwise(pl.col(column)).alias(column)
    


df.with_columns(replace("a", from_, to_))

Which outputs:

shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
│ 0   │
│ 1   │
│ 1   │
│ 5   │
└─────┘
Sign up to request clarification or add additional context in comments.

1 Comment

You can pass two identical size lists as old and new parameters to a polars.Series. Something like series.replace([1,2,3], [0,0,1]). While dealing with data frames, it becomes, import polars as pl; df = df.with_columns(pl.col("column").replace([1,2,3], [0,0,1])).
8

Update. November 2023

With the release of polars 0.19.16, map_dict was deprecated in favour of pl.Expr.replace.

import polars as pl

df = pl.DataFrame({
    "a": [1, 2, 3, 4, 5]
})

mapping_dict = {
    1: 0,
    2: 0,
    3: 10,
    4: 10
}

df.with_columns(
    pl.all().replace(mapping_dict)
)

Output.

shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
│ 0   │
│ 10  │
│ 10  │
│ 5   │
└─────┘

Note that the default behaviour has changed to keep any values not present in the mapping unchanged. To restore the previous default behaviour (and replace any values not present in the mapping with None), pass default=None as follows.

df.with_columns(
    pl.all().replace(mapping_dict, default=None)
)

Output.

shape: (5, 1)
┌──────┐
│ a    │
│ ---  │
│ i64  │
╞══════╡
│ 0    │
│ 0    │
│ 10   │
│ 10   │
│ null │
└──────┘

Comments

3

Just in case you like the pandas docstrings as well and want to place it as a utils function somewhere in your repo

def replace(column: str, mapping: dict) -> pl.internals.expr.Expr:
    """
    Create a polars expression that replaces a columns values.

    Parameters
    ----------
    column : str
        Column name on which values should be replaced.
    mapping : dict
        Can be used to specify different replacement values for different existing values. For example,
        ``{'a': 'b', 'y': 'z'}`` replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. Values not mentioned in ``mapping``
        will stay the same.

    Returns
    -------
    pl.internals.expr.Expr
        Expression that contains instructions to replace values in ``column`` according to ``mapping``.

    Raises
    ------
    Exception
        * If ``mapping`` is empty.
    TypeError
        * If ``column`` is not ``str``.
        * If ``mapping`` is not ``dict``.
    polars.exceptions.PanicException
        * When ``mapping`` has keys or values that are not mappable to arrows format. Only catchable via BaseException.
          See also https://pola-rs.github.io/polars-book/user-guide/datatypes.html.

    Examples
    --------
    >>> import polars as pl
    >>> df = pl.DataFrame({'fruit':['banana', 'apple', 'pie']})
    >>> df
    shape: (3, 1)
    ┌────────┐
    │ fruit  │
    │ ---    │
    │ str    │
    ╞════════╡
    │ banana │
    ├╌╌╌╌╌╌╌╌┤
    │ apple  │
    ├╌╌╌╌╌╌╌╌┤
    │ apple  │
    └────────┘
    >>> df.with_column(replace(column='fruit', mapping={'apple': 'pomegranate'}))
    shape: (3, 1)
    ┌─────────────┐
    │ fruit       │
    │ ---         │
    │ str         │
    ╞═════════════╡
    │ banana      │
    ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
    │ pomegranate │
    ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
    │ pomegranate │
    └─────────────┘

    """
    if not mapping:
        raise Exception("Mapping can't be empty")
    elif not isinstance(mapping, dict):
        TypeError(f"mapping must be of type dict, but is type: {type(mapping)}")
    if not isinstance(column, str):
        raise TypeError(f"column must be of type str, but is type: {type(column)}")

    branch = pl.when(pl.col(column) == list(mapping.keys())[0]).then(
        list(mapping.values())[0]
    )

    for from_value, to_value in mapping.items():
        branch = branch.when(pl.col(column) == from_value).then(to_value)

    return branch.otherwise(pl.col(column)).alias(column)

Comments

2

This could also be done by

  1. converting a mapping dictionary into a polars.DataFrame
  2. LEFT JOINing the original data to the mapper DataFrame
  3. fill any missing values (which the mapper dict does not account for) with the original values
  4. drop the original column

Here is the code to show this

Data

df = pl.DataFrame({"a": [1, 2, 3, 4, 5]})
print(df)
shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘

Define the mapper dict

mapper = {1: 0, 2: 0, 3: 10, 4: 10}

Create a mapper DataFrame

df_mapper = pl.DataFrame([{"a": k, "values": v} for k, v in mapper.items()])
print(df_mapper)
shape: (4, 2)
┌─────┬────────┐
│ a   ┆ values │
│ --- ┆ ---    │
│ i64 ┆ i64    │
╞═════╪════════╡
│ 1   ┆ 0      │
│ 2   ┆ 0      │
│ 3   ┆ 10     │
│ 4   ┆ 10     │
└─────┴────────┘

Use LEFT JOIN and .fill_null() to map the values, and then drop the original column

df = (
    df
    # LEFT JOIN
    .join(df_mapper, on=["a"], how="left")
    # fill missing values in mapped column with values from original column
    .with_columns([pl.col("values").fill_null(pl.col("a"))])
    # drop original column and replace with mapped column
    .drop(["a"])
    .rename({"values": "a"})
)
print(df)
shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
│ 0   │
│ 10  │
│ 10  │
│ 5   │
└─────┘

Comments

0

Can't use code snippet in comments, so I'll post this slight generalization as an answer.

In case the value to be mapped is missing from the mapping, this accepts a default value if provided, else it will act as if the mapping is the identity mapping.

import polars as pl
def apply_map(
    column: str, mapping: dict, default = None
) -> pl.Expr:
    branch = pl
    for key, value in mapping.items():
        branch = branch.when(pl.col(column) == key).then(value)
    default = pl.lit(default) if default is not None else pl.col(column)
    return branch.otherwise(default).alias(column)

Comments

0

You can also use map_elements with a dict, as long as you specify an exhaustive mapping for each from_ option:

df = pl.DataFrame({"a": [1, 2, 3, 4, 5]})

from_ = [1, 2, 3, 4, 5]
to_ = [99, 12, 4, 18, 64]
my_map = dict(zip(from_, to_))

df.select(pl.col("a").map_elements(lambda x: my_map[x]))

which outputs:

shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 99  │
│ 12  │
│ 4   │
│ 18  │
│ 64  │
└─────┘

It'll be slower than ritchie46's answer but it's quite a bit simpler.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.