0

Suppose I'm trying to remove this regular expression "RT\s*@USER\w\w{8}:\s*" and I want to remove this form of regular expression in my RDD.

My current RDD is:

text = sc.textFile(...)
delimited = text.map(lambda x: x.split("\t"))

and here is the part where I'm trying to remove regular expression. I tried doing following RDD transformation to get rid of every strings that matches this regular expression but it all gave me an error.

abc = delimited.map(lambda x: re.sub(r"RT\s*@USER\w\w{8}:\s*", " ", x))
TypeError: expected string or buffer

and

abc = re.sub(r"RT\s*@USER\w\w{8}:\s*", " ", delimited)
TypeError: expected string or buffer

and

abc = delimited.map(lambda x: re.sub(r"RT\s*@USER\w\w{8}:\s*", " ", text))
Exception: It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation. RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.

I want to remove this regular expression so that I can proceed to the next RDD transformations. How do I make this code in PySpark?

1 Answer 1

1

re.sub expects a string.

  • in the first anonymous function:

    lambda x: re.sub(r"RT\s*@USER\w\w{8}:\s*", " ", x)
    

    x is a list, since you split the line in the previous transformation.

  • In the second try, you pass an RDD: delimeted

  • In the third snippet of code you pass another RDD: text.

If you want to remove this regular expression for every element of your list, try this:

abc = delimited.map(lambda l: [re.sub(r"RT\s*@USER\w\w{8}:\s*", " ", x) for x in l])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.