How to apply custom functions with multiple parameters in Polars?

Question

Now I have a dataframe:

df = pd.DataFrame({
    "a":[1,2,3,4,5],
    "b":[2,3,4,5,6],
    "c":[3,4,5,6,7]
})

The function:

def fun(a,b,shift_len): 
     return a+b*shift_len,b-shift_len

Using Pandas, I can get the result by:

df[["d","e"]] = df.apply(lambda row:fun(row["a"],row["b"],3),axis=1,result_type="expand")

I want to know how can I use polars to get the same result?

Welcome to Stack Overflow! Please take the tour and read up on How to Ask. Note that this is not your favourite Python forum, but rather a question and answer site for all programming related problems. Thus, please edit your question to include the python tag, so that Python users can find your question more easily. — Adriaan
– Adriaan, Commented Jul 15, 2022 at 8:45

jqurious · Accepted Answer · 2024-07-17 20:28:31Z

The answer depends on whether you can rewrite your function using Polars expressions.

Using Polars Expressions

To obtain the best performance with Polars, try to code your calculations using Expressions. Expressions yield the most performant, embarrassingly parallel solutions.

For example, your function could be expressed as:

shift_len = 3
df.with_columns(
    (pl.col("a") + (pl.col("b") * shift_len)).alias("d"),
    (pl.col("b") - shift_len).alias("e"),
)

shape: (5, 5)
┌─────┬─────┬─────┬─────┬─────┐
│ a   ┆ b   ┆ c   ┆ d   ┆ e   │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 3   ┆ 7   ┆ -1  │
│ 2   ┆ 3   ┆ 4   ┆ 11  ┆ 0   │
│ 3   ┆ 4   ┆ 5   ┆ 15  ┆ 1   │
│ 4   ┆ 5   ┆ 6   ┆ 19  ┆ 2   │
│ 5   ┆ 6   ┆ 7   ┆ 23  ┆ 3   │
└─────┴─────┴─────┴─────┴─────┘

Polars will run both expressions in parallel, yielding very fast results.

Using `map_elements`

Let's assume that you cannot code your function as Polars Expressions (e.g., you need to use an external library). In Polars, you can use map_elements to run custom functions.

Since your function takes multiple parameters and returns multiple values, we'll take this in steps.

Passing multiple values

We can pass multiple values to the the fun function when using map_elements by "stamp-coupling" multiple columns into a single series using polars.struct. In the lambda function, the values are passed as a Python dict, with the names of the columns as the keys. So, for example, we access the value in column a in the lambda below as cols["a"].

df.with_columns(
    pl.struct("a", "b")
    .map_elements(lambda cols: fun(cols["a"], cols["b"], 3))
    .alias("result")
)

shape: (5, 4)
┌─────┬─────┬─────┬───────────┐
│ a   ┆ b   ┆ c   ┆ result    │
│ --- ┆ --- ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═════╪═══════════╡
│ 1   ┆ 2   ┆ 3   ┆ [7, -1]   │
│ 2   ┆ 3   ┆ 4   ┆ [11, 0]   │
│ 3   ┆ 4   ┆ 5   ┆ [15, 1]   │
│ 4   ┆ 5   ┆ 6   ┆ [19, 2]   │
│ 5   ┆ 6   ┆ 7   ┆ [23, 3]   │
└─────┴─────┴─────┴───────────┘

The result column contains the tuples returned by the fun function. However, note the type of the result column: Polars turned them into a list.

Handling multiple return values

Next we'll convert the tuple returned by the fun function to something more useful: a dictionary of key-value pairs, where the keys are the desired column names (d and e in your example).

We'll accomplish this by using Python's zip function and a tuple with the desired names.

When we run this code, we will get a column of type struct.

df.with_columns(
    pl.struct("a", "b")
    .map_elements(lambda cols: dict(zip(("d", "e"), fun(cols["a"], cols["b"], 3))))
    .alias("result")
)

shape: (5, 4)
┌─────┬─────┬─────┬───────────┐
│ a   ┆ b   ┆ c   ┆ result    │
│ --- ┆ --- ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ i64 ┆ struct[2] │
╞═════╪═════╪═════╪═══════════╡
│ 1   ┆ 2   ┆ 3   ┆ {7,-1}    │
│ 2   ┆ 3   ┆ 4   ┆ {11,0}    │
│ 3   ┆ 4   ┆ 5   ┆ {15,1}    │
│ 4   ┆ 5   ┆ 6   ┆ {19,2}    │
│ 5   ┆ 6   ┆ 7   ┆ {23,3}    │
└─────┴─────┴─────┴───────────┘

The names d and e do not appear in the output of the result column, but they are there.

Using `unnest`

In the last step, we'll use the unnest function to break the struct into two new columns.

df.with_columns(
    pl.struct("a", "b")
    .map_elements(lambda cols: dict(zip(("d", "e"), fun(cols["a"], cols["b"], 3))))
    .alias("result")
).unnest("result")

shape: (5, 5)
┌─────┬─────┬─────┬─────┬─────┐
│ a   ┆ b   ┆ c   ┆ d   ┆ e   │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 3   ┆ 7   ┆ -1  │
│ 2   ┆ 3   ┆ 4   ┆ 11  ┆ 0   │
│ 3   ┆ 4   ┆ 5   ┆ 15  ┆ 1   │
│ 4   ┆ 5   ┆ 6   ┆ 19  ┆ 2   │
│ 5   ┆ 6   ┆ 7   ┆ 23  ┆ 3   │
└─────┴─────┴─────┴─────┴─────┘

One caution: using map_elements with external libraries and/or custom Python bytecode subjects your code to the Python GIL. The result is very slow, single-threaded performance - no matter how it is coded. As such, I strongly suggest avoiding the use of map_elements and custom Python functions, and instead trying to code your algorithms using only Polars Expressions, if you can.

maya · Accepted Answer · 2022-07-15 09:01:58Z

-1

Passing arguments with args

import pandas as pd
df1 = pd.DataFrame({"a":[1,2,3,4,5],"b":[2,3,4,5,6],"c":[3,4,5,6,7]})


def t(df, row1, row2, shift_len):
    return df[row1] + df[row2] * shift_len, df[row2] - shift_len


df1[["d", "e"]] = df1.apply(t, args=("a", "b", 3), axis=1, result_type="expand")
print(df1)

OUTPUT:

   a  b  c   d  e
0  1  2  3   7 -1
1  2  3  4  11  0
2  3  4  5  15  1
3  4  5  6  19  2
4  5  6  7  23  3

answered Jul 15, 2022 at 9:01

maya

1,0901 gold badge5 silver badges9 bronze badges

Collectives™ on Stack Overflow

How to apply custom functions with multiple parameters in Polars?

2 Answers 2

Using Polars Expressions

Using `map_elements`

Passing multiple values

Handling multiple return values

Using `unnest`

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Using Polars Expressions

Using map_elements

Passing multiple values

Handling multiple return values

Using unnest

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Using `map_elements`

Using `unnest`