I've tried searching around but a lot of the answers seem to be old and no longer valid with the current polars version. How do I apply the return result of a python function to each row of the polars dataframe? I want to pass the entire row to the function instead of passing specific columns.
import polars as pl
def test2(auth, row):
c = row["Group"]
d = row["Val"]
return "{}-{}-{}".format(c, str(d), auth)
df = pl.DataFrame({
'Group': ['A', 'B', 'C', 'D', 'E'],
'Val': [1001, 1002, 1003, 1004, 1005]
})
auth_token = "xxxxxxxxx"
df = df.with_columns(
pl.struct(pl.all())
.map_batches(lambda x: test2(auth_token, x))
.alias("response")
)
print(df)
The code above causes this error. I don't understand this message. Where am I supposed to set strict=False and why is this necessary?
Traceback (most recent call last):
File "c:\Scripting\Python\Development\Test.py", line 29, in <module>
df = df.with_columns(
File "c:\Scripting\Python\Development\venv\lib\site-packages\polars\dataframe\frame.py", line 8763, in with_columns
return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
File "c:\Scripting\Python\Development\venv\lib\site-packages\polars\lazyframe\frame.py", line 1942, in collect
return wrap_df(ldf.collect(callback))
polars.exceptions.ComputeError: TypeError: unexpected value while building Series of type Int64; found value of type String: "C"
Hint: Try setting `strict=False` to allow passing data with mixed types.
I'm aware that I could do this specifying specific columns such as the code below but I want to pass in the whole row and then select which columns to use inside the function instead. Any help would be appreciated. Thank you.
df = df.with_columns(
(
pl.struct(["Group", "Val"]).map_batches(
lambda x: test(auth_token, x.struct.field("Group"), x.struct.field("Val"))
)
).alias("api_response")
)
map_elements()instead ofmap_batches(). But it will be quite slow cause you're not using any of polars optimizations that way