If string does not contain any of list of strings in python

Question

I have a list of strings, from which I want to locate every line that has 'http://' in it, but does not have 'lulz', 'lmfao', '.png', or any other items in a list of strings in it. How would I go about this?

My instincts tell me to use regular expressions, but I have a moral objection to witchcraft.

Andrew Clark · Accepted Answer · 2012-03-08 01:07:58Z

14

Here is an option that is fairly extensible if the list of strings to exclude is large:

exclude = ['lulz', 'lmfao', '.png']
filter_func = lambda s: 'http://' in s and not any(x in s for x in exclude)

matching_lines = filter(filter_func, string_list)

List comprehension alternative:

matching_lines = [line for line in string_list if filter_func(line)]

answered Mar 8, 2012 at 1:07

Andrew Clark

210k36 gold badges286 silver badges310 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

directedition Over a year ago

Awesome! I get to use lambda! I knew it existed for some reason!

Karl Knechtel Over a year ago

You don't have to. lambda allows you to define the function inline instead of setting up a variable filter_func; but you could just as easily write def filter_func(s): return 'http://' in s and not any(x in s for x in exclude). Remember, functions are objects.

wim Over a year ago

I would even say this is an inappropriate use of lambda. There is no reason to prefer it to a def here.

srgerg · Accepted Answer · 2012-03-08 01:10:18Z

3

This is almost equivalent to F.J's solution, but uses generator expressions instead of lambda expressions and the filter function:

haystack = ['http://blah', 'http://lulz', 'blah blah', 'http://lmfao']
exclude = ['lulz', 'lmfao', '.png']

http_strings = (s for s in haystack if s.startswith('http://'))
result_strings = (s for s in http_strings if not any(e in s for e in exclude))

print list(result_strings)

When I run this it prints:

['http://blah']

answered Mar 8, 2012 at 1:10

srgerg

19.4k4 gold badges59 silver badges40 bronze badges

1 Comment

lvc Over a year ago

+1 for generators. But, note that you can do this as a(n almost) one-liner: result_strings = [s for s in haystack if s.startswith('http://') and not any(e in s for e in exclude)]. It needs a line break to fit 80 columns (per most style guides), but I would argue it is slightly easier to follow than the two-generator version. timeit also reports that this is a fair bit faster, and also slightly faster than F.J's filter version (which, IMO, is the hardest to follow of the three).

Pablo Santa Cruz · Accepted Answer · 2012-03-08 01:13:37Z

2

Try this:

for s in strings:
    if 'http://' in s and not 'lulz' in s and not 'lmfao' in s and not '.png' in s:
        # found it
        pass

Other option, if you need your options more flexible:

words = ('lmfao', '.png', 'lulz')
for s in strings:
    if 'http://' in s and all(map(lambda x, y: x not in y, words, list(s * len(words))):
        # found it
        pass

edited Mar 8, 2012 at 1:13

answered Mar 8, 2012 at 1:03

Pablo Santa Cruz

182k33 gold badges250 silver badges300 bronze badges

2 Comments

directedition Over a year ago

That was my first approach. But as my list grew and the line became unwieldy, I was hoping there was a better way.

prelic Over a year ago

That could get out of hand if he ever wanted to extend the list of stop words. How would you change your approach? But still, +1 for simple solutions.

Collectives™ on Stack Overflow

If string does not contain any of list of strings in python

3 Answers 3

3 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related