I have a dataframe df that looks like this
index Posts clean_text
0 Hi I am fine. [Hi, I, am, fine]
1 You are a piece of shit. [You, are, a, piece, of, shit]
.
.
.
I have a list named corpus that has 3000 foul words.
I want to go through the column clean_text and add a new row result to the df by check a condition for all the rows. The condition is:
if any one of the words of the list in any row of the column clean_text is present in the corpus, the column result will have the string Irrelevant, otherwise Relevant.
Example: if any word of the list [Hi, I, am, fine] is present in the corpus, the column result will have Irrelevant, otherwise relevant. Since, this list dose not have any foul words, the output should be relevant.
The desired output is :
index Posts clean_text result
0 Hi I am fine. [Hi, I, am, fine] Relevant
1 You are a piece of shit. [You, are, a, piece, of, shit] Irrelevant
.
.
.
I want to do this using lambda function. I have done this so far-
df['result'] = df['clean_text'].map(lambda x: ["Relevant" for w in x if w not in corpus])
Firstly, I am unable to write the else part here and secondly it is showing an undesirable output like below.
index Posts clean_text result
0 Hi I am fine. [Hi, I, am, fine] [Relevant, Relevant, Relevant, Relevant]
1 You are a piece of shit. [You, are, a, piece, of, shit] [Relevant, Relevant, Relevant,...]
.
.
.
I also tried writing a ``for``` loop like this but it takes a lot of time:
for i in range(df.shape[0]):
for word in df.loc[i]['clean_text']:
if word in corpus:
df['result'] = "Irrelevant"
#break
else:
#continue
df['result'] = "Relevant"
Kindly help me to get the desired output using lambda function.
set.setis constant time, whereas in a list it's linear time.