5

Loving the Polars library for its fantastic speed and easy syntax!

Struggling with this question - is there an analogue in Polars for the Pandas code below? Would like to replace strings using a dictionary.

Tried using this expression, but it returns 'TypeError: 'dict' object is not callable'

pl.col("List").str.replace_all(lambda key: key,dict())

In Pandas I would use .replace()

import polars as pl

df = pl.DataFrame({'List': ['Systems', 'Software', 'Cleared']})

mapping = {'Systems':'Sys','Software':'Soft' ,'Cleared':'Clr'}

pl.from_pandas(df.to_pandas().replace(mapping, regex=True))

Output:

shape: (3, 1)
┌──────┐
│ List │
│ ---  │
│ str  │
╞══════╡
│ Sys  │
│ Soft │
│ Clr  │
└──────┘
1
  • I would think you're better off turning your lookup dict into another dataframe and then joining them Commented Dec 11, 2022 at 2:33

2 Answers 2

4

There is a "stale" feature request for accepting a dictionary:

One possible workaround is to stack multiple expressions in a loop:

expr = pl.col("List")

for old, new in dic.items():
    expr = expr.str.replace_all(old, new)
    
df.with_columns(result = expr)
shape: (3, 2)
┌──────────┬────────┐
│ List     ┆ result │
│ ---      ┆ ---    │
│ str      ┆ str    │
╞══════════╪════════╡
│ Systems  ┆ Sys    │
│ Software ┆ Soft   │
│ Cleared  ┆ Clr    │
└──────────┴────────┘

For non-regex cases, there is also .str.replace_many():

df.with_columns(
   pl.col("List").str.replace_many(
       ["Systems", "Software", "Cleared"],
       ["Sys", "Soft", "Clr"]
   )
   .alias("result")
)
Sign up to request clarification or add additional context in comments.

2 Comments

Just wonder as this now nearly 1.5 years old is this still the best way to do a regex substituions? E.g. with SUBSTITUTIONS= {"_": " ","€": "eur", r"(?<![\w])eu(?![\w])": "eur" then for pat, val in SUBSTITUTIONS.items(): df = df.with_columns(pl.col('text').str.replace_all(pat,val)
@Björn It depends. .str.replace_many now exists. (non-regex) Also, that regex wont work in this case as lookaround assertions are not supported by the underlying rust regex library.
1

I think your best bet would be to turn your dic into a dataframe and join the two.

You need to convert your dic to the format which will make a nice DataFrame. You can do that as a list of dicts so that you have

dicdf=pl.DataFrame([{'List':x, 'newList':y} for x,y in dic.items()])

where List is what your column name is and we're arbitrary making newList our new column name that we'll get rid of later

You'll want to join that with your original df and then select all columns except the old List plus newList but renamed to List

df=df.join(
    dicdf, 
    on='List') \
.select([
    pl.exclude(['List','newList']), 
    pl.col('newList').alias('List')
 ])

1 Comment

Definitely an option, thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.