1

I have one web scraping function which fetches data of 190 URL's. To complete it fast I used concurrent.future.Threadpool.executor. I am saving that data to SQL Server database. I have to do these all process repeatedly to every 3 mins from 9AM to 4PM. But when I use while loop or scheduler that concurrent future not works. No error and no output.

# required libraries
import request

urls = []

def data_fetched(url):
    # data fetching
    # operations on data
    # data saving to SQL server
    return ''

while True: 
    with concurrent.future.ThreadPool.executor() as executor:
        executor.map(data_fetched, url)
    time.sleep(60)

I want to repeat all these things to every 3 mins, explained flow of code. Please help me how to schedule it.

start = dt.strptime("09:15:00", "%H:%M:%S")
end = dt.strptime("15:30:00", "%H:%M:%S")
# min_gap
min_gap = 3

# compute datetime interval
arr = [(start + timedelta(hours=min_gap*i/60)).strftime("%H:%M:%S")
       for i in range(int((end-start).total_seconds() / 60.0 / min_gap))]
while True:
    weekno = datetime.datetime.today().weekday()
    now = dt.now() # gets current datetime
    hour = str(now.hour) # gets current hour
    minute = str(now.minute) # gets current minute
    second = str(now.second)
    current_time = f"{hour}:{minute}:{second}" # combines current hour and minute

    # checks if current time is in the hours list
    if weekno < 5 and current_time in arr:
            print('data_loaded')
    else:  # 5 Sat, 6 Sun
        pass
    time.sleep(60)

So under these while loop I want to call that function using concurrent.futures.

1 Answer 1

1

You can create a seperate function and schedule it to execute the data_fetched(). I hope your urls variable contains the list of urls and not empty list.

from schedule import every, repeat, run_pending
import time
import request


urls = []

def data_fetched(url):
    # data fetching
    # operations on data
    # data saving to SQL server
    return ''

@repeat(every(3).minutes)
def execute_script():
    with concurrent.future.ThreadPool.executor() as executor:
        executor.map(data_fetched, urls)

while True:
    run_pending()
    time.sleep(1)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Jamiu Shaibu, used for loop and ur suggested flow to schedule concurrent.futures. It's working as expected now.
You are welcome @varshapatil, I'm glad it worked for you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.