1

I have a dataframe as follows:

import polars as pl

df = pl.DataFrame({'r_num':['Yes', '', 'Yes'], 'pin': ['Yes','',''],'fin':['','','']})
shape: (3, 3)
┌───────┬─────┬─────┐
│ r_num ┆ pin ┆ fin │
│ ---   ┆ --- ┆ --- │
│ str   ┆ str ┆ str │
╞═══════╪═════╪═════╡
│ Yes   ┆ Yes ┆     │
│       ┆     ┆     │
│ Yes   ┆     ┆     │
└───────┴─────┴─────┘

Here I would like to find an observation which has r_num is YES, pin is Yes and fin is EMPTY. on meeting this condition r_num and pin should be filled in as EMPTY.

df.with_columns(
    pl.when((pl.col('r_num')=='Yes') & (pl.col('pin')=='Yes') & (pl.col('fin') !='Yes'))
    .then(pl.col('r_num')=='')
    .otherwise(pl.col('r_num'))
)
shape: (3, 3)
┌───────┬─────┬─────┐
│ r_num ┆ pin ┆ fin │
│ ---   ┆ --- ┆ --- │
│ str   ┆ str ┆ str │
╞═══════╪═════╪═════╡
│ false ┆ Yes ┆     │
│       ┆     ┆     │
│ Yes   ┆     ┆     │
└───────┴─────┴─────┘

Why r_num is getting filled up with false?

This is how I would do is in pandas:

df_pd = df.to_pandas()
df_pd.loc[(df_pd['r_num']=='Yes') & (df_pd['pin']=='Yes') & (df_pd['fin']!='Yes'),['r_num','pin']] = ''

Expected result:

shape: (3, 3)
┌───────┬─────┬─────┐
│ r_num ┆ pin ┆ fin │
│ ---   ┆ --- ┆ --- │
│ str   ┆ str ┆ str │
╞═══════╪═════╪═════╡
│       ┆     ┆     │
│       ┆     ┆     │
│ Yes   ┆     ┆     │
└───────┴─────┴─────┘
1
  • 1
    For documentation, this response shows how to accomplish changing multiple columns within a single when/then/otherwise: stackoverflow.com/a/73718390/18559875 Commented Sep 14, 2022 at 17:11

1 Answer 1

4

You had a small bug in your code. Within the .then() method, you'd specified

pl.col('r_num')==''

This evaluates to False for the first row (where the when condition is True). The behaviour of .then() is to insert that value into the cell that is being evaluated.

What you want is .then(pl.lit("")), and to tell polars that you want to essentially overwrite r_num (and pin) with .alias("r_num") / .alias("pin").

Since the value in .otherwise depends on which column you want to overwrite, I think we have to perform the loop twice:

(
    df_
    .with_columns(
        pl.when((pl.col("r_num") == "Yes") & (pl.col("pin") == "Yes") & (pl.col("fin") != "Yes"))
          .then(pl.lit("")) # changed here
          .otherwise(pl.col("r_num"))
          .alias("r_num"), # added this
        pl.when((pl.col("r_num") == "Yes") & (pl.col("pin") == "Yes") & (pl.col("fin") != "Yes"))
          .then(pl.lit("")) # changed here
          .otherwise(pl.col("pin"))
          .alias("pin"), # added this
    )
)

This results in your desired output:

shape: (3, 3)
┌───────┬─────┬─────┐
│ r_num ┆ pin ┆ fin │
│ ---   ┆ --- ┆ --- │
│ str   ┆ str ┆ str │
╞═══════╪═════╪═════╡
│       ┆     ┆     │
│       ┆     ┆     │
│ Yes   ┆     ┆     │
└───────┴─────┴─────┘
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.