0

I have a set of lists of items FreqItemsets, for example :

FreqItemset(items=[u'bbb_1', u'ccc_1', u'ccc_2', u'aaa_1', u'ccc_3'], freq=379)
FreqItemset(items=[u'aaa_1_1', u'ccc_1', u'ccc_2', u'ccc_3'], freq=375)
...

I try to find in each FreqItemset an item starts from aaa

I know how to find aaa in first element of the list

filtered_result = model.freqItemsets()\
 .filter(lambda x: x.items[0].startswith('aaa_')).collect()

The question is how to find aaa in each element of FreqItemset?

In first line of example above aaa string in forth place.

I thought about something like this :

   filtered_result = model.freqItemsets()\
     .filter(lambda x: x.items[0].startswith('aaa_'))
     .filter(lambda x: x.items[1].startswith('aaa_'))
     .filter(lambda x: x.items[2].startswith('aaa_'))
     ...
     .collect()

is it most efficient way?

2
  • Those items are not lists of sets. Those are lists of unicode strings. Do you know how to do this with a 'normal' list (Python's default datatype list)? That would be a starting point in combination with so called list comprehensions. Commented Feb 7, 2016 at 10:19
  • @albert if you know the answer please provide it Commented Feb 7, 2016 at 10:27

2 Answers 2

1

Since I do not have the datatype FreqItemset I am just demonstrating a general approach using Python's default datatype list:

list_1 = [u'bbb_1', u'ccc_1', u'ccc_2', u'aaa_1', u'ccc_3']
list_2 = [u'aaa_1_1', u'ccc_1', u'ccc_2', u'ccc_3']

results_1 = [s for s in list_1 if s.startswith('aaa')]
results_2 = [s for s in list_2 if s.startswith('aaa')]

print(results_1)
print(results_2)

Since I am using Python 3 and it looks like you're using Python 2, you need to change print(something) into print something.

Note: You can adapt this general approach in order to all that stuff kind of less manual e.g. iterating over a list of lists (or FreqItemsets in your case). Or write the results into a dictionary containing e.g. the different frequencies as keys.

Sign up to request clarification or add additional context in comments.

2 Comments

could it be like .filter(s for s in items if items.startswith('aaa_')) ?
I have never used any filter() function. Sorry for that.
1

If I understand you correct, you want to filter those elements consisting only of certain strings. It looks like a job for all():

itemsets.filter(lambda x: all(i.startswith('aaa_') for i in x.items))

Which I'd rather extract to a new function:

def is_good(itemset):
    return all(i.startswith('aaa_') for i in x.items)

itemsets.filter(is_good)

5 Comments

@berael Thanks a lot for answering . Unfortunately I get error : AttributeError: 'list' object has no attribute 'startswith'
@Toren fixed, should be x.items in the loop.
Many thanks, nice solution ! спасибо ! Please fix solution : change from all to any . I'll vote then . Nice work .
@Toren applying multiple filters is and operation, therefore, it's all, not any.
all not works in , my case , because it keeps list with all elements starts with aaa_ , otherwise any works because any element in list can start with aaa_ . I've verified this in my code .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.