Polars lazy dataframe custom function over rows

Question

I am trying to run a custom function on a lazy dataframe on a row-by-row basis. Function itself does not matter, so I'm using softmax as a stand-in. All that matters about it is that it is not computable via pl expressions.

I get about this far:

import polars as pl
import numpy as np

def softmax(t):
    a = np.exp(np.array(t))
    return tuple(t/np.sum(t))

ldf = pl.DataFrame({ 'id': [1,2,3], 'a': [0.2,0.1,0.3], 'b': [0.4,0.1,0.3], 'c': [0.4,0.8,0.4]}).lazy()

cols = ['a','b','c']
redict = { f'column_{i}':c for i,c in enumerate(cols) }

ldf.select(cols).map_batches(lambda bdf: bdf.map_rows(softmax).rename(redict)).collect()

However, if I want to get a resulting lazy df that contains columns other than cols (such as id), I get stuck, because

ldf.with_columns(pl.col(cols).map_batches(lambda bdf: bdf.map_rows(softmax).rename(redict))).collect()

no longer works, because pl.col(cols).map_batches is done column-by-column...

This does not seem like it would be an uncommon use case, so I'm wondering if I'm missing something.

FWIW polars is very resistant to row-by-row operations and the apis are in my experience correspondingly limited — 2e0byo
– 2e0byo, Commented Mar 3 at 14:55

BallpointBen · Accepted Answer · 2025-03-03 16:11:53Z

3

Polars is pretty averse to row by row operations. Generally if you need that, I'd suggest unpivoting (formerly, “melting”) and computing over the id column.

ldf.unpivot(index="id").with_columns(
    pl.col("value").map_batches(softmax).over("id")
).collect()

shape: (9, 3)
┌─────┬──────────┬──────────┐
│ id  ┆ variable ┆ value    │
│ --- ┆ ---      ┆ ---      │
│ i64 ┆ str      ┆ f64      │
╞═════╪══════════╪══════════╡
│ 1   ┆ a        ┆ 0.290461 │
│ 2   ┆ a        ┆ 0.249143 │
│ 3   ┆ a        ┆ 0.322043 │
│ 1   ┆ b        ┆ 0.35477  │
│ 2   ┆ b        ┆ 0.249143 │
│ 3   ┆ b        ┆ 0.322043 │
│ 1   ┆ c        ┆ 0.35477  │
│ 2   ┆ c        ┆ 0.501713 │
│ 3   ┆ c        ┆ 0.355913 │
└─────┴──────────┴──────────┘

If you need this back in wide format, you can pivot the resulting DataFrame.

ldf.unpivot(index="id").with_columns(
    pl.col("value").map_batches(softmax).over("id")
).collect().pivot("variable", index="id")

shape: (3, 4)
┌─────┬──────────┬──────────┬──────────┐
│ id  ┆ a        ┆ b        ┆ c        │
│ --- ┆ ---      ┆ ---      ┆ ---      │
│ i64 ┆ f64      ┆ f64      ┆ f64      │
╞═════╪══════════╪══════════╪══════════╡
│ 1   ┆ 0.290461 ┆ 0.35477  ┆ 0.35477  │
│ 2   ┆ 0.249143 ┆ 0.249143 ┆ 0.501713 │
│ 3   ┆ 0.322043 ┆ 0.322043 ┆ 0.355913 │
└─────┴──────────┴──────────┴──────────┘

edited Mar 3 at 16:11

answered Mar 3 at 15:14

BallpointBen

15.6k2 gold badges46 silver badges81 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

velochy Mar 3 at 15:32

What you are doing applies softmax over each column, not on a row by row basis, so it does not really solve the problem I'm having

BallpointBen Mar 3 at 16:12

Sorry, switched to being over id instead of variable, which corresponds to row-wise in the original df.

velochy · Accepted Answer · 2025-03-03 15:27:25Z

1

I actually found a relatively nice solution that just takes advantage of batches being materialized in memory.

import polars as pl

def softmax(ar):
    a = np.exp(ar)
    return a/np.sum(a,axis=-1)

def apply_npf_on_pl_df(df,cols,npf):
    df[cols] = npf(df[cols].to_numpy())
    return df

ldf = pl.DataFrame({ 'id': [1,2,3], 'a': [0.2,0.1,0.3], 'b': [0.4,0.1,0.3], 'c': [0.4,0.8,0.4]}).lazy()

cols = ['a','b','c']
redict = { f'column_{i}':c for i,c in enumerate(cols) }

ldf.map_batches(lambda bdf: apply_npf_on_pl_df(bdf,cols,softmax)).collect()

This is likely not ideal if there are a lot of other rows, but for my use case (with very few additional columns) this looks pretty efficient.

answered Mar 3 at 15:27

velochy

4433 silver badges14 bronze badges

Collectives™ on Stack Overflow

Polars lazy dataframe custom function over rows

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related