3

I need to run 20 tasks asynchronously (each task runs the same function, but with a different argument). Each task uses Python's yfinance API module. This is my current method:

  1. Define a list args with 20 elements; each element is the argument to be passed to the corresponding task.
  2. Define an async function get_data which I will run 20 times with a different argument each time.
  3. Define an async function main which will use asyncio.gather to run the 20 tasks asynchronously.

And here is the (pseudo)code:

import asyncio

stocks = []
args = ['arg1', 'arg2', ... , 'arg20']


async def get_data(arg):
    stock = Stock(arg)
    # do some yfinance calls
    return stock


async def main():
    global stocks
    tasks = [asyncio.ensure_future(get_data(arg)) for arg in args]
    stocks = await asyncio.gather(*tasks)


asyncio.run(main())

print(stocks)  # should be a list of 20 return values from the 20 tasks

Assume each task on its own takes 4 seconds to run. Then the 20 tasks should run in 4 seconds if it's running asynchronously. However, it is running in 80 seconds. If I remove all the async code and just run it synchronously, it runs in the same amount of time. Any help?

Thanks.

4
  • 1
    Your get_data() function is not awaiting anything, which is a red flag that it's async in name only, but in fact is blocking. To get benefits of asyncio, you need to use an async library for accessing stocks (or whatever else the code needs), and use await. Commented Apr 26, 2021 at 7:30
  • You might want to read up on what "asynchronous" actually means – it is not the same as "in parallel". How does asyncio actually work? could be a worthwhile read, albeit a long one. Commented Apr 26, 2021 at 9:09
  • @S.Naj any feedback? Commented Apr 28, 2021 at 7:18
  • As a newbie to asynchronous code, I did not realize there was a difference between asynchronous and "in parallel." @ArtiomKozyrev 's solution works exactly as desired, so I guess I learned that the ThreadPoolExecutor module runs synchronous code "in parallel," which is not the same as running code asynchronously. Commented May 3, 2021 at 2:23

1 Answer 1

7

I have checked documentation of yfinance and see requests library in requirements, the library ins not async. It means that you should not use it with asyncio module, you should use theading.Thread or concurrent.futures.ThreadPoolExecutor instead.

I made the following example for you, please run it and share your results.

from concurrent.futures import ThreadPoolExecutor
import yfinance as yf
from pprint import pprint
from time import monotonic


def get_stocks_data(name: str) -> dict:
    """some random function which extract some data"""
    tick = yf.Ticker(name)
    tick_info = tick.info
    return tick_info


if __name__ == '__main__':
    # some random stocks
    stocks = [
        'AAPL', 'AMD', 'AMZN', 'FB', 'GOOG', 'MSFT', 'TSLA', 'MSFT',
        'AAPL', 'AMD', 'AMZN', 'FB', 'GOOG', 'MSFT', 'TSLA', 'MSFT',
    ]
    start_time = monotonic()
    # you can choose max_workers number higher and check if app works faster
    # e.g choose 16 as max number of workers
    with ThreadPoolExecutor(max_workers=4) as pool:
        results = pool.map(get_stocks_data, stocks)

    for r in results:
        pprint(r)

    print("*" * 150)
    print(monotonic() - start_time)
Sign up to request clarification or add additional context in comments.

1 Comment

This solution worked exactly as desired. For 10 tasks running synchronously, run time was about 60 seconds. With this solution, run time was about 7 seconds, as expected. Thanks a lot for your effort and commitment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.