4

Sample df:

import polars as pl
import numpy as np
df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "A": [True, True, False, False, False],
    }
)

I want to replace column random. So far, I've been doing

new = np.arange(5)
df.replace('random', pl.Series(new))

note that replace is one of the few polars methods which works inplace!

But now I'm getting

C:\Users\...\AppData\Local\Temp\ipykernel_18244\1406681700.py:2: DeprecationWarning: `replace` is deprecated. DataFrame.replace is deprecated and will be removed in a future version. Please use
    df = df.with_columns(new_column.alias(column_name))
instead.
  df = df.replace('random', pl.Series(new)) 

So, should I do

df = df.with_columns(pl.Series(new).alias('random'))

Seems more verbose, also inplace modification is gone. Am I doing things right?

2 Answers 2

2

Disclaimer. I think that the polars developers want to nudge the users away from using in-place updates. Also, pl.DataFrame.with_columns is a cheap operation as it is incredibly optimized and doesn't just copy the underlying data. Hence, using

df = df.with_columns(pl.Series("random", new))

seems like the best approach. See this answer for more information.


Still, if you need in-place updates (e.g. because you implemented a library function, whose interface depends on it), you can use pl.DataFrame.replace_column.

new_col = pl.Series("random", np.arange(5))
df.replace_column(df.columns.index(new_col.name), new_col)
Sign up to request clarification or add additional context in comments.

Comments

2

Yes, you are doing right. You need to use with_columns in the following way:

import polars as pl
import numpy as np

df = pl.DataFrame({
    "nrs": [1, 2, 3, None, 5],
    "names": ["foo", "ham", "spam", "egg", None],
    "random": np.random.rand(5), 
    "A": [True, True, False, False, False],
})

print(df)
new = np.arange(5)

new_series = pl.Series('random', new)

df_new = df.with_columns(new_series)

print(df_new)

Here is the original df:

shape: (5, 4)
┌──────┬───────┬──────────┬───────┐
│ nrs  ┆ names ┆ random   ┆ A     │
│ ---  ┆ ---   ┆ ---      ┆ ---   │
│ i64  ┆ str   ┆ f64      ┆ bool  │
╞══════╪═══════╪══════════╪═══════╡
│ 1    ┆ foo   ┆ 0.736232 ┆ true  │
│ 2    ┆ ham   ┆ 0.017485 ┆ true  │
│ 3    ┆ spam  ┆ 0.940966 ┆ false │
│ null ┆ egg   ┆ 0.157872 ┆ false │
│ 5    ┆ null  ┆ 0.003914 ┆ false │
└──────┴───────┴──────────┴───────┘

and here is the new one

shape: (5, 4)
┌──────┬───────┬────────┬───────┐
│ nrs  ┆ names ┆ random ┆ A     │
│ ---  ┆ ---   ┆ ---    ┆ ---   │
│ i64  ┆ str   ┆ i64    ┆ bool  │
╞══════╪═══════╪════════╪═══════╡
│ 1    ┆ foo   ┆ 0      ┆ true  │
│ 2    ┆ ham   ┆ 1      ┆ true  │
│ 3    ┆ spam  ┆ 2      ┆ false │
│ null ┆ egg   ┆ 3      ┆ false │
│ 5    ┆ null  ┆ 4      ┆ false │
└──────┴───────┴────────┴───────┘

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.