python process multiple string at the same time

Question

I have a list of strings and i want to remove the stop words inside each string. The thing is, the length of the stopwords is much longer than the strings and I don't want to repeat comparing each string with the stopwords list. Is there a way in python that these multiple strings at the same time?

lis = ['aka', 'this is a good day', 'a pretty dog']
stopwords = [] # pretty long list of words
for phrase in lis:
    phrase = phrase.split(' ') # get list of words
    for word in phrase:
        if stopwords.contain(word):
            phrase.replace(word, '')

This is my current method. But these means I have to go through all the phrases in the list. Is there a way that I can process these phrases with only one time compare?

Thanks.

How long is "long"? If it's less than 100,000 elements, I wouldn't worry about it. Especially if you make stopwords into a set, as x in set checking is very fast. — Kevin
– Kevin, Commented Dec 5, 2014 at 16:26
a nested list comprehension statement would maybe be nicer(or more confusing? ) to look at, but this is pretty much the best way i can see to do this — TehTris
– TehTris, Commented Dec 5, 2014 at 16:28
@Kevin Well, it's 100, 000 long but still don't want to check like multiple times.. — JudyJiang
– JudyJiang, Commented Dec 5, 2014 at 16:29
you have to check if each phrase has to be checked and as kevin said using a set would make lookups 0(1) — Padraic Cunningham
– Padraic Cunningham, Commented Dec 5, 2014 at 16:30
Some complexity comparisons show that checking for x in stopwords is linear in time if stopwords is a list and constant in time if it is a set (as Kevin said). In other words, with a set, you (almost) wouldn't feel the difference between a little one and a huge one (it's fast in both case). — Dettorer
– Dettorer, Commented Dec 5, 2014 at 16:36

Cory Kramer · Accepted Answer · 2014-12-05 16:27:20Z

3

This is the same idea, but with a few improvements. Convert your list of stopwords to a set for faster lookups. Then you can iterate over your phrase list in a list comprehension. You can then iterate over the words in the phrase, and keep them if they're not in the stop set, then join the phrase back together.

>>> lis = ['aka', 'this is a good day', 'a pretty dog']
>>> stopwords = ['a', 'dog']
>>> stop = set(stopwords)
>>> [' '.join(j for j in i.split(' ') if j not in stop) for i in lis]
['aka', 'this is good day', 'pretty']

answered Dec 5, 2014 at 16:27

Cory Kramer

119k19 gold badges176 silver badges233 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Raydel Miranda · Accepted Answer · 2014-12-05 16:47:27Z

1

You could compute the difference between the list formed by each phrase and the stop words.

>>> lis = ['aka', 'this is a good day', 'a pretty dog']
>>> stopwords = ['a', 'dog']

>>> stop = set(stopwords)
>>> result = map(lambda phrase: " ".join(list( set(phrase.split(' ')) - stop)), lis)
>>> print( result )

['aka', 'this is good day', 'pretty']

answered Dec 5, 2014 at 16:47

Raydel Miranda

14.4k3 gold badges48 silver badges66 bronze badges

1 Comment

Dettorer Over a year ago

That actually messes up the order of the words in the phrases since you make a set out the split. with lis = ['a b c d e f g'] it gives ['c b e d g f'].

Collectives™ on Stack Overflow

python process multiple string at the same time

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related