I am trying to do some scraping where I essentially have a list of URLs and I get the HTML response and then continue with the scraping. Naturally, I have attempted to make the requests to the URLs asynchronously but I have failed.
Here is what I have so far:
import aiohttp
import asyncio
async def save_file(row, file_path):
with open(file_path, 'w') as f:
f.write(row)
async def download_html(url_idx, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
# Requesting the page
body = await resp.text()
file_path = #path to output file
#some scraping logic......
await save_file(#result of the scrapping, file_path)
async def main():
urls = ['url1', 'url2', 'url3']
tasks = []
for url_idx, url in enumerate(urls):
task = asyncio.create_task(download_html(url_idx, url))
tasks.append(task)
await asyncio.gather(*tasks)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Running the above code gives me this error:
There is no current event loop: loop = asyncio.get_event_loop()
What would be the right approach?
I would also love to know how to use tqdm to display a progress bar.
asyncio.run(main()).