hope someone can help with a simple sentiment analysis in Pyspark. I have a Pyspark dataframe where each row contains a word. I also have a dictionary of common stopwords.
I want to remove the rows where the word (value of the row) is in the stopwords dict.
Input:
+-------+
| word |
+-------+
| the|
| food|
| is|
|amazing|
| and|
| great|
+-------+
stopwords = {'the', 'is', 'and'}
Expected Output:
+-------+
| word |
+-------+
| food|
|amazing|
| great|
+-------+