3

I would like to add multiple columns at once to a Polars dataframe, where each column derives from the same object (for a row), by creating the object only once and then returning a method of that object for each column. Here is a simplified example using a range object:

import polars as pl

df = pl.DataFrame({
    'x': [11, 22],
})

def uses_object(x):
    r = list(range(0, x))
    c10 = r.count(10)
    c12 = r.count(12)
    return c10, c12

df = df.with_columns(
    count_of_10 = pl.col('x').map_elements(lambda x: uses_object(x)[0]),
    count_of_12 = pl.col('x').map_elements(lambda x: uses_object(x)[1]),
)

print(df)
shape: (2, 3)
┌─────┬─────────────┬─────────────┐
│ x   ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ ---         ┆ ---         │
│ i64 ┆ i64         ┆ i64         │
╞═════╪═════════════╪═════════════╡
│ 11  ┆ 1           ┆ 0           │
│ 22  ┆ 1           ┆ 1           │
└─────┴─────────────┴─────────────┘

I tried multiple assignment

df = df.with_columns(
    count_of_10, count_of_12 = uses_object(pl.col('x')),
)

but got error

NameError
name 'count_of_10' is not defined.

Can I change the code to call uses_object only once?

0

2 Answers 2

2

If you return a dictionary from your function:

return dict(count_of_10=c10, count_of_12=c12)

You will get a struct column:

df.with_columns(
   count = pl.col('x').map_elements(uses_object)
)
shape: (2, 2)
┌─────┬───────────┐
│ x   ┆ count     │
│ --- ┆ ---       │
│ i64 ┆ struct[2] │
╞═════╪═══════════╡
│ 11  ┆ {1,0}     │
│ 22  ┆ {1,1}     │
└─────┴───────────┘

Which you can .unnest() into individual columns.

df.with_columns(
   count = pl.col('x').map_elements(uses_object)
).unnest('count')
shape: (2, 3)
┌─────┬─────────────┬─────────────┐
│ x   ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ ---         ┆ ---         │
│ i64 ┆ i64         ┆ i64         │
╞═════╪═════════════╪═════════════╡
│ 11  ┆ 1           ┆ 0           │
│ 22  ┆ 1           ┆ 1           │
└─────┴─────────────┴─────────────┘

As for your current approach, you would call it once and then use Polars list methods to extract the values in a separate .with_columns / .select e.g.

df.with_columns(
   count = pl.col('x').map_elements(uses_object)
).with_columns(
   count_of_10 = pl.col('count').list.first(),
   count_of_12 = pl.col('count').list.last(),
).drop('count')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I used your first suggestion: I like how it puts all the return information (column names and values) in one place, namely uses_object().
1

You can use to_struct and unnest() to convert returned list to separate columns:

df.with_columns(
    cnt=pl.col('x').map_elements(uses_object)
).with_columns(
    pl.col('cnt').list.to_struct(fields=['count_of_10','count_of_12'])
).unnest('cnt')

┌─────┬─────────────┬─────────────┐
│ x   ┆ count_of_10 ┆ count_of_12 │
│ --- ┆ ---         ┆ ---         │
│ i64 ┆ i64         ┆ i64         │
╞═════╪═════════════╪═════════════╡
│ 11  ┆ 1           ┆ 0           │
│ 22  ┆ 1           ┆ 1           │
└─────┴─────────────┴─────────────┘

1 Comment

Thanks. That gave an error as written, but worked fine, including naming the unnested columns correctly, when I removed the second with_columns clause--this worked: df = df.with_columns( cnt=pl.col('x').map_elements(uses_object) ).unnest('cnt')

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.