Wrapping synchronous requests into asyncio (async/await)?

Question

I am writing a tool in Python 3.6 that sends requests to several APIs (with various endpoints) and collects their responses to parse and save them in a database.

The API clients that I use have a synchronous version of requesting a URL, for instance they use

urllib.request.Request('...

Or they use Kenneth Reitz' Requests library.

Since my API calls rely on synchronous versions of requesting a URL, the whole process takes several minutes to complete.

Now I'd like to wrap my API calls in async/await (asyncio). I'm using python 3.6.

All the examples / tutorials that I found want me to change the synchronous URL calls / requests to an async version of it (for instance aiohttp). Since my code relies on API clients that I haven't written (and I can't change) I need to leave that code untouched.

So is there a way to wrap my synchronous requests (blocking code) in async/await to make them run in an event loop?

I'm new to asyncio in Python. This would be a no-brainer in NodeJS. But I can't wrap my head around this in Python.

Update 2023-06-12

Here's how I'd do it in Python 3.9+

import asyncio
import requests

async def main():
    response1 = await asyncio.to_thread(requests.get, 'http://httpbin.org/get')
    response2 = await asyncio.to_thread(requests.get, 'https://api.github.com/events')
    print(response1.text)
    print(response2.text)

asyncio.run(main())

You have synchronous methods that you want to call asynchronously. Therefore you need to write async wrappers for them - any Python async tutorial teaches you how to do this. Here's one. stackabuse.com/python-async-await-tutorial. Whether you are calling functions that do HTTP requests internally or anything else doesn't make any difference at all. — Tomalak
– Tomalak, Commented Jun 25, 2017 at 11:18
I know that example. That one uses an async version of requesting a URL. It uses client = aiohttp.ClientSession(loop=loop) and later async with client.get(url)... — Ugur
– Ugur, Commented Jun 25, 2017 at 12:10
I'm not sure what you mean. Have you read the entire post or just the lines with http on them? — Tomalak
– Tomalak, Commented Jun 25, 2017 at 14:15
I think @Ugur is right. The tutorial does not contain an example that shows how to wrap a synchronous function (one that doesn't return an awaitable), so that it can be used with async/await effectively. The example in the tutorial wouldn't work if you replace aiohttp with requests. — Rotareti
– Rotareti, Commented Jan 21, 2018 at 5:11

Claude · Accepted Answer · 2017-06-25 20:10:35Z

The solution is to wrap your synchronous code in the thread and run it that way. I used that exact system to make my asyncio code run boto3 (note: remove inline type-hints if running < python3.6):

async def get(self, key: str) -> bytes:
    s3 = boto3.client("s3")
    loop = asyncio.get_event_loop()
    try:
        response: typing.Mapping = \
            await loop.run_in_executor(  # type: ignore
                None, functools.partial(
                    s3.get_object,
                    Bucket=self.bucket_name,
                    Key=key))
    except botocore.exceptions.ClientError as e:
        if e.response["Error"]["Code"] == "NoSuchKey":
            raise base.KeyNotFoundException(self, key) from e
        elif e.response["Error"]["Code"] == "AccessDenied":
            raise base.AccessDeniedException(self, key) from e
        else:
            raise
    return response["Body"].read()

Note that this will work because the vast amount of time in the s3.get_object() code is spent in waiting for I/O, and (generally) while waiting for I/O python releases the GIL (the GIL is the reason that generally threads in python is not a good idea).

The first argument None in run_in_executor means that we run in the default executor. This is a threadpool executor, but it may make things more explicit to explicitly assign a threadpool executor there.

Note that, where using pure async I/O you could easily have thousands of connections open concurrently, using a threadpool executor means that each concurrent call to the API needs a separate thread. Once you run out of threads in your pool, the threadpool will not schedule your new call until a thread becomes available. You can obviously raise the number of threads, but this will eat up memory; don't expect to be able to go over a couple of thousand.

Also see the python ThreadPoolExecutor docs for an explanation and some slightly different code on how to wrap your sync call in async code.

Collectives™ on Stack Overflow

Wrapping synchronous requests into asyncio (async/await)?

Update 2023-06-12

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Update 2023-06-12

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related