1

After performing some operations I get a list as following :

FreqItemset(items=[u'A_String_0'], freq=303)
FreqItemset(items=[u'A_String_0', u'Another_String_1'], freq=302)
FreqItemset(items=[u'B_String_1', u'A_String_0', u'A_OtherString_1'], freq=301)

I'd like to remove from list all items start from A_String_0 , but I'd like to keep other items (doesn't matter if A_String_0 exists in the middle or at the end of item )

So in example above delete lines 1 and 2 , keep line 3

I tried

 filter(lambda a: a != 'A_String_0', result)

and

result.remove('A_String_0')

all this doesn't help me

4
  • The second method works for me. Commented Dec 16, 2015 at 16:01
  • What do you mean by I'd like to remove from list all items start from A_String_0? Commented Dec 16, 2015 at 16:04
  • He wants to remove 'A_String_0' if it's the first element in the list, else leave it alone Commented Dec 16, 2015 at 16:04
  • I see function calls, not lists Commented Dec 16, 2015 at 16:12

3 Answers 3

2

It is as simple as this:

from pyspark.mllib.fpm import FPGrowth

sets = [
    FPGrowth.FreqItemset(
       items=[u'A_String_0'], freq=303),
    FPGrowth.FreqItemset(
        items=[u'A_String_0', u'Another_String_1'], freq=302),
    FPGrowth.FreqItemset(
        items=[u'B_String_1', u'A_String_0', u'A_OtherString_1'], freq=301)
]

[x for x in sets if x.items[0] != 'A_String_0']
## [FreqItemset(items=['B_String_1', 'A_String_0', 'A_OtherString_1'], freq=301)]

In practice it would better to filter beffore collect:

filtered_sets = (model
    .freqItemsets()
    .filter(lambda x: x.items[0] != 'A_String_0')
    .collect())
Sign up to request clarification or add additional context in comments.

2 Comments

Can you please provide an example ? In case I'd like to search for 'A_S*' instead of 'A_String_0' ?
x.items[0].startswith("A_S")
2

How about result = result if result[0] != 'A_String_0' else result[1:]?

Comments

2

It seems that you are using a list called FreqItemset. However, the name suggests that you should be using a set, instead of a list.

This way, you could have a set of searchable pairs string, frequency. For example:

>>> d = { "the": 2, "a": 3 }
>>> d[ "the" ]
2
>>> d[ "the" ] = 4
>>> d[ "a" ]
3
>>> del d[ "a" ]
>>> d
{'the': 4}

You can easily access each word (which is a key of the dictionary), change its value (its frequency of apparition), or remove it. All operations avoid the access to all the elements of the list, since it is a dictionary, i.e., its performance is good (better than using a list, anyway).

Just my two cents.

4 Comments

Thanks a lot for help . I'll try . About the type of Itemset , when I execute "print type (result) " I get a list . ( result = model....)
Do you mean you cannot change it?
As I understand it's list of sets
You should use the most appropriate data structure. If a list of sets does not suit you, then change it to a simple set.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.