0

What is the best way to send a lot of POST requests to a REST endpoint via Python? E.g. I want to upload ~500k files to a database.

What I've done so far is a loop that creates for each file a new request using the requests package.

# get list of files
files = [f for f in listdir(folder_name)]
# loop through the list 
for file_name in files:
   try: 
      # open file and get content
      with open(folder_name + "\\" + file_name, "r") as file:
         f = file.read()
         # create request
         req = make_request(url, f)

         # error handling, logging, ...

But as this is quite slow: what is the best practice to do that? Thank you.

1
  • 1
    [f for f in listdir(folder_name)] is redundant. just use for file_name in listdir(folder_name) Commented Feb 25, 2019 at 15:34

2 Answers 2

1

First approach:

I dont know if it is the best practice you can split the files in batches of 1000 and zip it and send it as post requests using threads ( set the num threads = number of processor cores)

( The rest end point can extract the zipped contents and then process it )

second approach:

zip the files in batches and transfer it in batches after the transfer is completed , validate in the server side Then start the database upload at one go.

Sign up to request clarification or add additional context in comments.

1 Comment

Zip cloud help to reduce the amount of files, but still that are many files.But I like the approach with multiple threads of different cores.
1

The first thing you want to do is determine exactly which part of your script is the bottleneck. You have both disk and network I/O here (reading files and sending HTTP requests, respectively).

Assuming that the HTTP requests are the actual bottleneck (highly likely), consider using aiohttp instead of requests. The docs have some good examples to get you started and there are plenty of "Quick Start" articles out there. This would allow your network requests to be cooperative, meaning that other python code can run while one of your network requests is waiting. Just to be careful to not overwhelm whatever server is receiving the requests.

2 Comments

Thanks for the reading material, but this doesn't block my program. But still is this good (best) practice to upload a lot of files?
I'm not sure that there is any best practice in regards to how to approach this. In general, async will help speed up situations where you need to make a large number of network requests.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.