How to control if a string is present in a website through python

Question

I'm trying to identify if a string like "data=sold" is present in a website.

Now I'm using requests and a while loop but I need it to be faster:

  response = requests.get(link)
  if ('data=sold' in response.text):

It works well but it is not fast , is there a way to "request" only the part of the website I need to make the researching faster ?

That completely depends on the website, but most likely not. — Klaus D.
– Klaus D., Commented Apr 18, 2019 at 8:33
Is it an html attribute? ie data attribute with value = sold? — QHarr
– QHarr, Commented Apr 18, 2019 at 9:48

Thotsaphon Sirikutta · Accepted Answer · 2019-04-18 08:36:03Z

1

I think you response.text is html right ?

to avoid to search string you can try with Beautiful Soup Doc here

from bs4 import BeautifulSoup
html = response.text
bs = BeautifulSoup(html)
[item['data-sold] for item in bs.find_all('ul', attrs={'data-sold' : True})]

can see other ref here

or maybe I think a about parallel for loop in python

we can make many requests in same time

answered Apr 18, 2019 at 8:36

Thotsaphon Sirikutta

1287 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Marià Over a year ago

would it be faster ?

Thotsaphon Sirikutta Over a year ago

yep It's will be more faster if do parallel loop but please remind if we hit many requests in same time at same site they may be detect our requests as spam and block us

Reductio · Accepted Answer · 2019-04-18 08:55:45Z

As already commented, it depends on the website/server if you can only request a part of the page. Since it is a website I would think it's not possible.

If the website is really really big, the only way I can currently think of to make the search faster is to process the data just in time. When you call requests.get(link), the site will be downloaded before you can process the data. You maybe could try to call

 r = requests.get(link, stream=True)

instead. And then iterate through all the lines:

 for line in r:
    if ('data=sold' in line):
       print("hooray")

Of course you could also analyze the raw stream and just skip x bytes, use the aiohttp library, ... maybe you need to give some more information about your problem.

Collectives™ on Stack Overflow

How to control if a string is present in a website through python

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related