1

How can I ask if a string pattern, in this case C, exists within any element of this set without removing them each and looking at them?

This test fails, and I am not sure why. My guess is that Python is checking if any element in the set is C, instead of if any element contains C:

n [1]: seto = set()

In [2]: seto.add('C123.45.32')

In [3]: seto.add('C2345.345.32')

In [4]: 'C' in seto
Out[4]: False

I know that I can iterate them set to make this check:

In [11]: for x in seto:
    if 'C' in x:
        print(x)
   ....:         
C2345.345.32
C123.45.32

But that is not what I am looking to do in this case. Ok thanks for the help!

Edit

I am sorry, these are set operations, not list as my original post implied.

4
  • How are you going to find out, if you don't look at each value? Commented Aug 12, 2013 at 20:35
  • An iteration may be done as the implementation, but per my code needs I just want to ask something like x in y? Commented Aug 12, 2013 at 20:36
  • 1
    So you are looking for a partial match, instead of a complete match? Write a partialIn function that iterates over the list and does the comparison. in tries to match the entire element. Commented Aug 12, 2013 at 20:37
  • @RobertHarvey thanks, that would work too I am sure. I think this was due to me not familiar enough with Python yet to think about the any() function. Commented Aug 12, 2013 at 20:52

3 Answers 3

3
'C' in seto

This checks to see if any of the members of seto is the exact string 'S'. Not a substring, but exactly that string. To check for a substring, you'll want to iterate over the set and perform a check on each item.

any('C' in item for item in seto)

The exact nature of the test can be easily changed. For instance, if you want to be stricter about where C can appear:

any(item.startswith('C') for item in seto)
Sign up to request clarification or add additional context in comments.

2 Comments

@Houdini I've edited my post to match. The solution for sets is the same as for lists.
Nice, thank you, I had heard of the any() function but I have not had the chance to implement or look at it much it yet. Looks like a good time to try :)
2

Taking John's answer one stage further, if you want to use the subset of items containing C:

items_with_c = {item for item in seto if 'C' in item}
if items_with_c:
    do_something_with(items_with_c)
else:
    print "No items contain C"

Comments

1

The other solutions you've been given are correct, understandable, and good Python, and they are reasonably performant if your set is small.

It is, however, possible to do what you want (at, of course, a considerable overhead in memory and setup time; TANSTAAFL) much more quickly using an index. And this index maintains constant performance no matter how big your data gets (assuming you have enough memory to hold it all). If you're doing a lot of looking up, this can make your script a lot faster. And the memory isn't as bad as it could be...

We'll build a dict in which the keys are every possible substring from the items in the index, and the values are a set of the items that contain that substring.

from collections import defaultdict

class substring_index(defaultdict):

    def __init__(self, seq=()):
        defaultdict.__init__(self, set)
        for item in seq:
            self.add(item)

    def add(self, item):
        assert isinstance(item, str)   # requires strings
        if item not in self[item]:     # performance optimization for duplicates
            size = len(item) + 1
            for chunk in range(1, size):
                for start in range(0, size-chunk):
                    self[item[start:start+chunk]].add(item)

seto = substring_index()
seto.add('C123.45.32')
seto.add('C2345.345.32')

print(len(seto))      # 97 entries for 2 items, I wasn't kidding about the memory

Now you can easily (and instantly) test to see whether any substring is in the index:

print('C' in seto)    # True

Or you can easily find all strings that contain a particular substring:

print(seto['C'])      # set(['C2345.345.32', 'C123.45.32'])

This can be pretty easily extended to include "starts with" and "ends with" matches, too, or to be case-insensitive.

For a less memory-intensive version of the same idea, look into tries.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.