1

I have a pyspark dataframe like the input data below. I would like to create a new column product1_num that parses the first numeric in each record in the productname column, in to a new column. I have example output data below. I'm not sure what's available in pyspark as far as string split and regex matching. Can anyone suggest how to do this with pyspark?

input data:

+------+-------------------+
|id    |productname        |
+------+-------------------+
|234832|EXTREME BERRY SAUCE|
|419836|BLUE KOSHER SAUCE  |
|350022|GUAVA (1G)         |
|123213|GUAVA 1G           |
+------+-------------------+

output:

+------+-------------------+-------------+
|id    |productname        |product1_num |
+------+-------------------+-------------+
|234832|EXTREME BERRY SAUCE|             |
|419836|BLUE KOSHER SAUCE  |             |
|350022|GUAVA (1G)         |1            |
|123213|GUAVA G5           |5            |
|125513|3GULA G5           |3            |
|127143|GUAVA G50          |50           |
|124513|LAAVA C2L5         |2            |
+------+-------------------+-------------+

1 Answer 1

1

You can use regexp_extract:

from pyspark.sql import functions as F
df.withColumn("product1_num", F.regexp_extract("productname", "([0-9]+)",1)).show()

+------+-------------------+------------+
|    id|        productname|product1_num|
+------+-------------------+------------+
|234832|EXTREME BERRY SAUCE|            |
|419836|  BLUE KOSHER SAUCE|            |
|350022|         GUAVA (1G)|           1|
|123213|           GUAVA G5|           5|
|125513|           3GULA G5|           3|
|127143|          GUAVA G50|          50|
|124513|         LAAVA C2L5|           2|
+------+-------------------+------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.