3

I have a big Iterable.
and i want to filter it using filter() function.
how can i count (in some elegant way) how many items are filtered?
(same question could be for map(), reduce() etc)

sure i can just make:

items = get_big_iterable()
count_good = 0
count_all = 0
for item in items:
    if should_keep(item):
        count_good += 1
    count_all += 1

print('keep: {} of {}'.format(count_good, count_all))

is it somehow possible with filter()?

items = filter(should_keep, get_big_iterable()) 
for item in items:
    #... using values here ..
    #possible count not filtered items here too? 

I should not iterate twice, and would like to use filter() or similar solution

0

4 Answers 4

3

Another option is to use the sum() function, e.g.:

count = sum(1 for x in get_big_iterable() if should_keep(x))
Sign up to request clarification or add additional context in comments.

1 Comment

this is faster because avoids creation of a list!!
2

It should be pretty straightforward with enumerate, and some basic arithmetic:

def should_keep(x):
    return x % 3 == 0

items = range(1, 28)


def _wrapper(x):
    return should_keep(x[1])

filtered_with_counts = enumerate(filter(_wrapper, enumerate(items, 1)), 1)

for i, (j, item) in filtered_with_counts:
    # do something with item
    print(f"Item is {item}, total: {j}, good: {i}, bad: {j-i}")

count_all = j
count_good = i
count_bad = count_all - count_good
print(f"Final: {count_all}, {count_good}, {count_bad}")

Output:

Item is 3, total: 3, good: 1, bad: 2
Item is 6, total: 6, good: 2, bad: 4
Item is 9, total: 9, good: 3, bad: 6
Item is 12, total: 12, good: 4, bad: 8
Item is 15, total: 15, good: 5, bad: 10
Item is 18, total: 18, good: 6, bad: 12
Item is 21, total: 21, good: 7, bad: 14
Item is 24, total: 24, good: 8, bad: 16
Item is 27, total: 27, good: 9, bad: 18
Final: 27, 9, 18

I probably wouldn't use this though. Note, I assume you may not want to modify should_keep, but you can always wrap it.

1 Comment

i think the other answers are more useful for real systems, but this fits best as answer for my question :)
2

There are two ways I can think of: first one is short, but is probably not good for performance and defeat the purpose of having an iterator:

count=len(list(your_filtered_iterable))

Another way is to write your own filter. Per Python documentation:

Note that filter(function, iterable) is equivalent to the generator expression (item for item in iterable if function(item)) if function is not None and (item for item in iterable if item) if function is None.

So you can write something like this:

class Filter:
    def __init__(self, func, iterable):
        self.count_good = 0
        self.count_all = 0
        self.func = func
        self.iterable = iterable

    def __iter__(self):
        if self.func is None:
            for obj in self.iterable:
                if obj:
                    self.count_good += 1
                    self.count_all += 1
                    yield obj
                else:
                    self.count_all += 1
        else:
            for obj in self.iterable:
                if self.func(obj):
                    self.count_good += 1
                    self.count_all += 1
                    yield obj
                else:
                    self.count_all += 1

Then you can access the count_good and count_all from the Filter instance:

items = Filter(should_keep, get_big_terable()) 
    for item in items:
        # do whatever you need with item
        print('keep: {} of {}'.format(items.count_good, items.count_all))

Comments

1

The builtin filter does not provide that. You need to write your own filter class, implementing its __next__ and __iter__ methods.

Code

class FilterCount:
    def __init__(self, function, iterable):
        self.function = function
        self.iterable = iter(iterable)
        self.countTrue, self.countFalse = 0, 0

    def __iter__(self):
        return self

    def __next__(self):
        nxt = next(self.iterable)
        while not self.function(nxt):
            self.countFalse += 1
            nxt = next(self.iterable)

        self.countTrue += 1
        return nxt

Example

lst = ['foo', 'foo', 'bar']
filtered_lst = FilterCount(lambda x: x == 'foo', lst)

for x in filtered_lst:
    print(x)
print(filtered_lst.countTrue)
print(filtered_lst.countFalse)

Output

foo
foo
2
1

4 Comments

i need the filtered values to iterate too, not only count. How to access them in this case?
@ya_dimon I see what you need now, see updated answer
would you explain why its different from @pkqxdd answer? stackoverflow.com/a/51852326/2519073 , do we need the next and not iter? (its another question i know, but just in case you have time..)
@ya_dimon a solution using __next__ is stateful, this means if you partially iterate over it, and then come back to it you will continue from where you left while a solution using __iter__ alone is stateless, whenever you start iterating over it, you go back to the start. With that regard, this solution is closer to filter

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.