I'm currently trying to send data to a azure document db collection on python (using pydocumentdb lib). Actually i have to send about 100 000 document on this collection and this takes a very long time (about 2 hours).
I send each document one by one using :
for document in documents :
client.CreateDocument(collection_link, document)
Am i doing wrong, is there another faster way to do it or it's just normal that it takes so long.
Thanks !
client.CreateDocument()must complete its full round trip before it will go to the next document in the loop. This is incredibly inefficient. You need to get more parallelism or bigger batches in your round trips. Not sure how you do the former in Python, but the latter can be accomplished by using a stored procedure where you send in an array of JSON documents (not all 100,000, but maybe 1,000 at a time) as input to the sproc.