Lambda function with for loop and if-else statement

Question

I have a dataframe df that looks like this

index       Posts                    clean_text
  0     Hi I am fine.              [Hi, I, am, fine]
  1     You are a piece of shit.   [You, are, a, piece, of, shit]
.
.
.

I have a list named corpus that has 3000 foul words.

I want to go through the column clean_text and add a new row result to the df by check a condition for all the rows. The condition is: if any one of the words of the list in any row of the column clean_text is present in the corpus, the column result will have the string Irrelevant, otherwise Relevant.

Example: if any word of the list [Hi, I, am, fine] is present in the corpus, the column result will have Irrelevant, otherwise relevant. Since, this list dose not have any foul words, the output should be relevant.

The desired output is :

index       Posts                    clean_text                       result
  0     Hi I am fine.              [Hi, I, am, fine]                  Relevant
  1     You are a piece of shit.   [You, are, a, piece, of, shit]     Irrelevant
.
.
.

I want to do this using lambda function. I have done this so far-

df['result'] = df['clean_text'].map(lambda x: ["Relevant" for w in x if w not in corpus]) Firstly, I am unable to write the else part here and secondly it is showing an undesirable output like below.

index       Posts                    clean_text                       result
  0     Hi I am fine.              [Hi, I, am, fine]                  [Relevant, Relevant, Relevant, Relevant]
  1     You are a piece of shit.   [You, are, a, piece, of, shit]     [Relevant, Relevant, Relevant,...]
.
.
.

I also tried writing a ``for``` loop like this but it takes a lot of time:

for i in range(df.shape[0]):
    for word in df.loc[i]['clean_text']:
      if word in corpus:
        df['result'] = "Irrelevant"
        #break
      else:
        #continue
        df['result'] = "Relevant"

Kindly help me to get the desired output using lambda function.

why does it have to be a lambda expression? Why not a regular function defintion? — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 5, 2021 at 8:31
Probably the biggest problem is that you are using a list for corpus. You should use a set. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 5, 2021 at 8:32
Because membership testing in a set is constant time, whereas in a list it's linear time. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 5, 2021 at 8:38

juanpa.arrivillaga · Accepted Answer · 2021-02-05 08:42:49Z

3

Use corpus = set(corpus).

Then you can use something like

df['clean_text'].map(lambda l: "Relevant" if any(x in corpus for x in l) else "Irrelevant")

Note, the fact that you are using a lambda is really not relevant. You could have done something like:

def search_corpus(tokens):
    if any(token in corpus for token in tokens):
        return "Relevant"
    return "Irrelevant"

And do:

df['clean_text'].map(search_corpus)

And this won't affect performance. lambda expressions don't create anything special, and you never have to use one.

answered Feb 5, 2021 at 8:42

juanpa.arrivillaga

97.6k14 gold badges141 silver badges190 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

d_b Over a year ago

Thanks for the solution. Also, is there a way by which I can check which words are matchng from the corpus and add those in a separate column.

juanpa.arrivillaga Over a year ago

@dipanjana why not map(lambda l: [x for x in l if x in corpus])

Collectives™ on Stack Overflow

Lambda function with for loop and if-else statement

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related