Find string in list of strings

Question

I have a set of lists of items FreqItemsets, for example :

FreqItemset(items=[u'bbb_1', u'ccc_1', u'ccc_2', u'aaa_1', u'ccc_3'], freq=379)
FreqItemset(items=[u'aaa_1_1', u'ccc_1', u'ccc_2', u'ccc_3'], freq=375)
...

I try to find in each FreqItemset an item starts from aaa

I know how to find aaa in first element of the list

filtered_result = model.freqItemsets()\
 .filter(lambda x: x.items[0].startswith('aaa_')).collect()

The question is how to find aaa in each element of FreqItemset?

In first line of example above aaa string in forth place.

I thought about something like this :

   filtered_result = model.freqItemsets()\
     .filter(lambda x: x.items[0].startswith('aaa_'))
     .filter(lambda x: x.items[1].startswith('aaa_'))
     .filter(lambda x: x.items[2].startswith('aaa_'))
     ...
     .collect()

is it most efficient way?

Those items are not lists of sets. Those are lists of unicode strings. Do you know how to do this with a 'normal' list (Python's default datatype list)? That would be a starting point in combination with so called list comprehensions. — albert
– albert, Commented Feb 7, 2016 at 10:19

albert · Accepted Answer · 2016-02-07 10:33:41Z

1

Since I do not have the datatype FreqItemset I am just demonstrating a general approach using Python's default datatype list:

list_1 = [u'bbb_1', u'ccc_1', u'ccc_2', u'aaa_1', u'ccc_3']
list_2 = [u'aaa_1_1', u'ccc_1', u'ccc_2', u'ccc_3']

results_1 = [s for s in list_1 if s.startswith('aaa')]
results_2 = [s for s in list_2 if s.startswith('aaa')]

print(results_1)
print(results_2)

Since I am using Python 3 and it looks like you're using Python 2, you need to change print(something) into print something.

Note: You can adapt this general approach in order to all that stuff kind of less manual e.g. iterating over a list of lists (or FreqItemsets in your case). Or write the results into a dictionary containing e.g. the different frequencies as keys.

answered Feb 7, 2016 at 10:33

albert

8,70111 gold badges59 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Toren Over a year ago

could it be like .filter(s for s in items if items.startswith('aaa_')) ?

albert Over a year ago

I have never used any filter() function. Sorry for that.

bereal · Accepted Answer · 2016-02-09 14:17:31Z

1

If I understand you correct, you want to filter those elements consisting only of certain strings. It looks like a job for all():

itemsets.filter(lambda x: all(i.startswith('aaa_') for i in x.items))

Which I'd rather extract to a new function:

def is_good(itemset):
    return all(i.startswith('aaa_') for i in x.items)

itemsets.filter(is_good)

edited Feb 9, 2016 at 14:17

answered Feb 9, 2016 at 8:31

bereal

34.7k8 gold badges65 silver badges111 bronze badges

5 Comments

Toren Over a year ago

@berael Thanks a lot for answering . Unfortunately I get error : AttributeError: 'list' object has no attribute 'startswith'

bereal Over a year ago

@Toren fixed, should be x.items in the loop.

Toren Over a year ago

Many thanks, nice solution ! спасибо ! Please fix solution : change from all to any . I'll vote then . Nice work .

bereal Over a year ago

@Toren applying multiple filters is and operation, therefore, it's all, not any.

Toren Over a year ago

all not works in , my case , because it keeps list with all elements starts with aaa_ , otherwise any works because any element in list can start with aaa_ . I've verified this in my code .

Collectives™ on Stack Overflow

Find string in list of strings

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related