11

I know how to apply a function to all columns present in a Pandas-DataFrame. However, I have not figured out yet how to achieve this when using a Polars-DataFrame.

I checked the section from the Polars User Guide devoted to this topic, but I have not find the answer. Here I attach a code snippet with my unsuccessful attempts.

import numpy as np
import polars as pl
import seaborn as sns

# Loading toy dataset as Pandas DataFrame using Seaborn
df_pd = sns.load_dataset('iris')

# Converting Pandas DataFrame to Polars DataFrame
df_pl = pl.DataFrame(df_pd)

# Dropping the non-numeric column...
df_pd = df_pd.drop(columns='species')                     # ... using Pandas
df_pl = df_pl.drop('species')                             # ... using Polars

# Applying function to the whole DataFrame...
df_pd_new = df_pd.apply(np.log2)                          # ... using Pandas
# df_pl_new = df_pl.apply(np.log2)                        # ... using Polars?

# Applying lambda function to the whole DataFrame...
df_pd_new = df_pd.apply(lambda c: np.log2(c))             # ... using Pandas
# df_pl_new = df_pl.apply(lambda c: np.log2(c))           # ... using Polars?

Thanks in advance for your help and your time.

2
  • 1
    Can you change the tag to python-polars? Commented Jan 29, 2022 at 16:08
  • 1
    Of course. I just added python-polars tag to the original question tags. Commented Jan 30, 2022 at 17:35

1 Answer 1

20

You can use the expression syntax to select all columns with pl.all() and then map_batches the numpy np.log2(..) function over the columns.

df.select(
    pl.all().map_batches(np.log2)
)

Note that we choose map_batches here as map_elements would call the function upon each value.

map_elements = pl.Series(np.log2(value) for value in pl.Series([1, 2, 3]))

But np.log2 can be called once with multiple values, which would be faster.

map_batches = np.log2(pl.Series([1, 2, 3]))

See the User guide for more.

  • map_elements: Call a function separately on each value in the Series.
  • map_batches: Always passes the full Series to the function.

Numpy

Polars expressions also support numpy universal functions.

That means you can pass a polars expression to a numpy ufunc:

df.select(
    np.log2(pl.all())
)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.