0

I have the url https://www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793\r in dataset. I want to remove https:// at the start of the string and \r at the end of the string.

Creating dataframe to replicate the issue

c = spark.createDataFrame([('https://www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793\r',)], ['str'])

I tried below regexp_replace with pipe function. But it is not working as expected.

c.select(F.regexp_replace('str', 'https:// | \\r', '')).first()

Actual output: www.youcuomizei.comEquaion-Kid-Backack-Peronalized301793

Expected output: www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793

1
  • I added an extra space before and after pipe. Once I removed that it worked c.select(F.regexp_replace("str", "https://|[\\r]","")).first() Commented Oct 21, 2022 at 22:46

1 Answer 1

0

the "backslash"r (\r) is not showing in your original spark.createDataFrame object because you have to escape it. so your spark.createDataFrame should be. please note the double backslashes

c = spark.createDataFrame([("https://www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793\\r",)], ['str'])

which will give this output:

+------------------------------------------------------------------------------+
|str                                                                           |
+------------------------------------------------------------------------------+
|https://www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793\r|
+------------------------------------------------------------------------------+

your regex https://|[\\r] will not remove the \r . the regex should be

c = (c
    .withColumn("str", F.regexp_replace("str", "https://|[\\\\]r", "")) 
)

which will give this output:

+--------------------------------------------------------------------+
|str                                                                 |
+--------------------------------------------------------------------+
|www.youcustomizeit.com/p/Equations-Kids-Backpack-Personalized/301793|
+--------------------------------------------------------------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.