2

I have the following code:

async def fetch(session, url):
    video_id = url.split('/')[-2]
    async with session.get(url) as response:
        data = await response.text()
        async with aiofiles.open(f'{video_id}.json', 'w') as f:
            await f.write(data)


async def main(loop, urls):
    async with aiohttp.ClientSession(loop=loop) as session:
        tasks = [fetch(session, url) for url in urls]
        await asyncio.gather(*tasks)


if __name__ == '__main__':
    links = generate_links()
    loop = asyncio.get_event_loop()
    await main(loop, links)

The script runs smoothly in the Jupyter notebook but it won't run from within a .py script due to SyntaxError: 'await' outside function.

I'm trying to understand what is happening here and why this is the case.

3
  • 1
    I'd assume it happens because your code runs inside some function in jupyter. await main() should be changed to asyncio.run(main()), to run code successfully. docs.python.org/3/library/asyncio-task.html Commented Feb 22, 2021 at 16:16
  • @Galunid, I already tried that setup (ie. asyncio.run(main(loop, links)) but it throws: RuntimeError: Timeout context manager should be used inside a task Commented Feb 22, 2021 at 16:21
  • The short answer is that Jupyter rewrites your code so that it works for the common case (namely using asyncio), the reason being convenience. Commented Feb 22, 2021 at 16:24

1 Answer 1

1

For anybody else trying to figure it out, Galunid's tip was spot on. The issue has been the way the loop object has been used. Removing it from within the ClientSession() forces the client to use asyncio.get_event_loop() as default.

The final form is given below.

async def fetch(session, link):
    video_id = link.split('/')[-2]
    async with session.get(link) as response:
        data = await response.text()
        async with aiofiles.open(f'{video_id}.json', 'w') as f:
            await f.write(data)


async def main(urls):
    async with aiohttp.ClientSession() as session: 
        tasks = [fetch(session, url) for url in urls]
        await asyncio.gather(*tasks)


if __name__ == '__main__':
    links = generate_links()
    loop = asyncio.get_event_loop()
    asyncio.run(main(links))

Jupyter notebooks make use of this idea to handle the loop event in the background, allowing one to await the result directly.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.