Polars str.starts_with() with values from another column

Question

I have a polars DataFrame for example:

>>> df = pl.DataFrame({'A': ['a', 'b', 'c', 'd'], 'B': ['app', 'nop', 'cap', 'tab']})
>>> df
shape: (4, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ app │
│ b   ┆ nop │
│ c   ┆ cap │
│ d   ┆ tab │
└─────┴─────┘

I'm trying to get a third column C which is True if strings in column B starts with the strings in column A of the same row, otherwise, False. So in the case above, I'd expect:

┌─────┬─────┬───────┐
│ A   ┆ B   ┆ C     │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

I'm aware of the df['B'].str.starts_with() function but passing in a column yielded:

>>> df['B'].str.starts_with(pl.col('A'))
...  # Some stuff here.
TypeError: argument 'sub': 'Expr' object cannot be converted to 'PyString'

What's the way to do this? In pandas, you would do:

df.apply(lambda d: d['B'].startswith(d['A']), axis=1)

I am just starting to learn polars and there may be other ways, but I think we can compare them in their own slices. df.with_column( (pl.col('B').str.slice(0,1) == pl.col('A').str.slice(0,1)).alias('bool_') ) — r-beginners
– r-beginners, Commented Jan 16, 2023 at 12:01
@r-beginners This is a good start, what I want to do is a little more complicated, hence why I want to use the starts_with function since column A could be longer strings — Syafiq Kamarul Azman
– Syafiq Kamarul Azman, Commented Jan 16, 2023 at 12:29
It looks like only a couple of the regex methods in the .str namespace are currently set up to accept expressions. Perhaps this should be filed as a feature request. — jqurious
– jqurious, Commented Jan 17, 2023 at 13:01

jqurious · Accepted Answer · 2024-10-06 12:37:52Z

5

Expression support was added for .str.starts_with() in pull/6355 as part of the Polars 0.15.17 release.

df.with_columns(pl.col("B").str.starts_with(pl.col("A")).alias("C"))

shape: (4, 3)
┌─────┬─────┬───────┐
│ A   | B   | C     │
│ --- | --- | ---   │
│ str | str | bool  │
╞═════╪═════╪═══════╡
│ a   | app | true  │
│ b   | nop | false │
│ c   | cap | true  │
│ d   | tab | false │
└─────┴─────┴───────┘

edited Oct 6, 2024 at 12:37

answered Jan 26, 2023 at 20:15

jqurious

24.2k6 gold badges24 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jqurious · Accepted Answer · 2024-10-14 22:52:39Z

Using struct is another option if polars>=0.13.16. This approach, however, also uses str.startswith like this answer, instead of polars.Expr.str.starts_with.

Code:

import polars as pl

df = pl.DataFrame({'A': ['a', 'b', 'c', 'd'], 'B': ['app', 'nop', 'cap', 'tab']})

df.with_columns(
    pl.struct('A', 'B').map_elements(lambda r: r['B'].startswith(r['A'])).alias('C')
)

Output:

┌─────┬─────┬───────┐
│ A   ┆ B   ┆ C     │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

Reference:

How to write polars custom apply function that does the processing row by row?

jqurious · Accepted Answer · 2024-10-14 22:54:07Z

0

Okay after toying around for a bit, this works but I'm pretty sure uses Python strings in the back (based on the function name startswith) and therefore is not optimized:

>>> pl.concat((df, df.map_rows(lambda d: d[1].startswith(d[0]))), how="horizontal")
shape: (4, 3)
┌─────┬─────┬───────┐
│ A   ┆ B   ┆ map   │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

I'll put up a feature request on Polars to see if this can be improved.

edited Oct 14, 2024 at 22:54

jqurious

24.2k6 gold badges24 silver badges43 bronze badges

answered Jan 18, 2023 at 4:06

Syafiq Kamarul Azman

9022 gold badges10 silver badges31 bronze badges

Collectives™ on Stack Overflow

Polars str.starts_with() with values from another column

3 Answers 3

Comments

Code:

Output:

Reference:

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Code:

Output:

Reference:

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related