1

I want to takeout any value which is before (Impressions). Ex if i have value YouTube TrueView for Reach (Impressions), I will need YouTube TrueView for Reach.

Another example is YouTube Bumper (Impressions) --> YouTube Bumper

I am currently using :

validated_df=validated_df.withColumn("MediaNm", when(col("MediaNm").like("%Impressions%"),F.regexp_extract(F.col("MediaNm"), r".*?\(", 0)).otherwise(validated_df.MediaNm))

I am getting blank as a result of this.

2
  • Can you provide sample data input? Commented Sep 30, 2022 at 9:11
  • media_name->column YouTube TrueView for Reach (Impressions)->value YouTube Bumper (Impressions)->value Expected value that needed to be extract from above values are YouTube TrueView for Reach YouTube Bumper Commented Sep 30, 2022 at 9:23

1 Answer 1

1

If I understood correctly, you just want to remove the string ' (Impressions)': for this, you just need a regexp_replace

validated_df.withColumn('MediaNm', F.regexp_replace('MediaNm', ' \(Impressions\)', ''))

+--------------------------+
|MediaNm                   |
+--------------------------+
|YouTube TrueView for Reach|
|YouTube Bumper            |
+--------------------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.