I am trying to dissect an integer from data gathered by another beautifulsoup script I wrote. The data I get is always one of the three following:
<div id="counts"> 500 hits </div>
<div id="counts">3 hits </div>
<div id="counts"> hits </div>
The number of hits varies and is sometimes attached to the ">" and sometimes not. And other times the integer isn't there. So I wrote this script to return ONLY the number from the data (or tell me there is no number). It seems clunky and slow and I feel like there should be a faster way to do it? (in this code example, I included 'search' as one of the 3 possible outcomes of the bs scrape)
keywords = ['hits']
results = []
search = '<div id="hits"> 3 hits </div>'
num_check = False
store_next = False
words = search.split()
def is_number(results, num_check):
while num_check <= 0:
try:
float(results[0])
num_check = True
except ValueError:
results[0] = ''.join(filter(lambda x: x.isdigit(), results[0]))
if results[0] == '':
num_check = 2
if num_check <= 1:
print(results[0])
for word in reversed(words):
if store_next:
results.append(word)
store_next = False
elif word in keywords:
store_next = True
is_number(results, num_check)
EDIT: sometimes (rarely) the <div></div> contains more info, such as a ping speed (0.22 seconds), which is why I can't search the entire clause for integers.
''.join(filter(lambda x: x.isdigit(), results[0]))can be rewritten to simplyfilter(str.isdigit, results[0])textof each tag instead of thereprof the wholeTag, no?TypeError: float() argument must be a string or a numberon line 12 after it filters. If I tryprint(filter(str.isdigit, '<div id="hits">3'))I get<filter object at 0x00000000032BA160>printed.