3

I have a polars DataFrame for example:

>>> df = pl.DataFrame({'A': ['a', 'b', 'c', 'd'], 'B': ['app', 'nop', 'cap', 'tab']})
>>> df
shape: (4, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ app │
│ b   ┆ nop │
│ c   ┆ cap │
│ d   ┆ tab │
└─────┴─────┘

I'm trying to get a third column C which is True if strings in column B starts with the strings in column A of the same row, otherwise, False. So in the case above, I'd expect:

┌─────┬─────┬───────┐
│ A   ┆ B   ┆ C     │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

I'm aware of the df['B'].str.starts_with() function but passing in a column yielded:

>>> df['B'].str.starts_with(pl.col('A'))
...  # Some stuff here.
TypeError: argument 'sub': 'Expr' object cannot be converted to 'PyString'

What's the way to do this? In pandas, you would do:

df.apply(lambda d: d['B'].startswith(d['A']), axis=1)
3
  • 1
    I am just starting to learn polars and there may be other ways, but I think we can compare them in their own slices. df.with_column( (pl.col('B').str.slice(0,1) == pl.col('A').str.slice(0,1)).alias('bool_') ) Commented Jan 16, 2023 at 12:01
  • @r-beginners This is a good start, what I want to do is a little more complicated, hence why I want to use the starts_with function since column A could be longer strings Commented Jan 16, 2023 at 12:29
  • 1
    It looks like only a couple of the regex methods in the .str namespace are currently set up to accept expressions. Perhaps this should be filed as a feature request. Commented Jan 17, 2023 at 13:01

3 Answers 3

5

Expression support was added for .str.starts_with() in pull/6355 as part of the Polars 0.15.17 release.

df.with_columns(pl.col("B").str.starts_with(pl.col("A")).alias("C"))
shape: (4, 3)
┌─────┬─────┬───────┐
│ A   | B   | C     │
│ --- | --- | ---   │
│ str | str | bool  │
╞═════╪═════╪═══════╡
│ a   | app | true  │
│ b   | nop | false │
│ c   | cap | true  │
│ d   | tab | false │
└─────┴─────┴───────┘
Sign up to request clarification or add additional context in comments.

Comments

0

Using struct is another option if polars>=0.13.16. This approach, however, also uses str.startswith like this answer, instead of polars.Expr.str.starts_with.

Code:

import polars as pl

df = pl.DataFrame({'A': ['a', 'b', 'c', 'd'], 'B': ['app', 'nop', 'cap', 'tab']})

df.with_columns(
    pl.struct('A', 'B').map_elements(lambda r: r['B'].startswith(r['A'])).alias('C')
)

Output:

┌─────┬─────┬───────┐
│ A   ┆ B   ┆ C     │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

Reference:

How to write polars custom apply function that does the processing row by row?

Comments

0

Okay after toying around for a bit, this works but I'm pretty sure uses Python strings in the back (based on the function name startswith) and therefore is not optimized:

>>> pl.concat((df, df.map_rows(lambda d: d[1].startswith(d[0]))), how="horizontal")
shape: (4, 3)
┌─────┬─────┬───────┐
│ A   ┆ B   ┆ map   │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

I'll put up a feature request on Polars to see if this can be improved.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.