Looking for faster way to filter integer from string - Python 3.3

Question

I am trying to dissect an integer from data gathered by another beautifulsoup script I wrote. The data I get is always one of the three following:

<div id="counts"> 500 hits </div>
<div id="counts">3 hits </div>
<div id="counts"> hits </div>

The number of hits varies and is sometimes attached to the ">" and sometimes not. And other times the integer isn't there. So I wrote this script to return ONLY the number from the data (or tell me there is no number). It seems clunky and slow and I feel like there should be a faster way to do it? (in this code example, I included 'search' as one of the 3 possible outcomes of the bs scrape)

keywords = ['hits']
results = []
search = '<div id="hits"> 3 hits </div>'

num_check = False
store_next = False
words = search.split()

def is_number(results, num_check):
    while num_check <= 0:
        try:
            float(results[0])
            num_check = True
        except ValueError:
            results[0] = ''.join(filter(lambda x: x.isdigit(), results[0]))
            if results[0] == '':
                num_check = 2
    if num_check <= 1:
        print(results[0])

for word in reversed(words):
    if store_next:
        results.append(word)
        store_next = False
    elif word in keywords:
        store_next = True

is_number(results, num_check)

EDIT: sometimes (rarely) the <div></div> contains more info, such as a ping speed (0.22 seconds), which is why I can't search the entire clause for integers.

Not really an answer, but fyi, ''.join(filter(lambda x: x.isdigit(), results[0])) can be rewritten to simply filter(str.isdigit, results[0]) — mhlester
– mhlester, Commented Jan 31, 2014 at 20:32
It seems better to have your other script generate just the text of each tag instead of the repr of the whole Tag, no? — roippi
– roippi, Commented Jan 31, 2014 at 20:39
That doesn't seem to work. I get a TypeError: float() argument must be a string or a number on line 12 after it filters. If I try print(filter(str.isdigit, '<div id="hits">3')) I get <filter object at 0x00000000032BA160> printed. — Gronk
– Gronk, Commented Jan 31, 2014 at 20:44
ideone.com/v35d5Q will show you how to get a string back from filter in python 3 ... in python2 it just stays a string — Joran Beasley
– Joran Beasley, Commented Jan 31, 2014 at 20:48
@Gronk, sorry for misinforming you. I'm on python2 here. The lambda was unnecessary, but the join apparently was not — mhlester
– mhlester, Commented Jan 31, 2014 at 20:54

Joran Beasley · Accepted Answer · 2014-01-31 20:43:15Z

2

ummm maybe

search = '<div id="hits"> 3 hits </div>'
re.findall("\d+",search)

or for floats

re.findall("\d+\.?\d*",search)

if you know theres not going to be more than one at a time you could do

re.search("(\d+)",search).group(0)

here is some timing info

>>> timeit.timeit("re.search(\"(\d+)\",'<div id=\"hits\"> 3 hits </div>').group(   0)","import re",number = 1000)
0.0031895773144583472
>>> timeit.timeit("filter(str.isdigit, '<div id=\"hits\"> 3 hits </div>')",numbe   r=1000)
0.0049939576031476918
>>>

edited Jan 31, 2014 at 20:43

answered Jan 31, 2014 at 20:34

Joran Beasley

114k13 gold badges168 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Gronk Over a year ago

Sorry, I should have added in my post that occasionally there is ping info like 0.22 seconds' contained in the <div></div>`. I have edited the post to add that info. I had not thought of this though and would work most of the time.

Joran Beasley Over a year ago

how bout re.search("(\d+) hits",search_text) that should only match the pattern shown

Gronk Over a year ago

do you mean re.search("(\d+) hits",search)? If search = '<div id="hits">3,153 hits 476.12 seconds </div> the re. function returns <_sre.SRE_Match object at 0x00000000032FA918>

Gronk Over a year ago

Sorry, more info: if I: s = re.search("(\d+) hits",search when I return s.group() it only gives 153 hits which means it seems to be stopping at the comma.

Collectives™ on Stack Overflow

Looking for faster way to filter integer from string - Python 3.3

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related