0

I'm currently trying to send data to a azure document db collection on python (using pydocumentdb lib). Actually i have to send about 100 000 document on this collection and this takes a very long time (about 2 hours).

I send each document one by one using :

for document in documents :
    client.CreateDocument(collection_link, document)

Am i doing wrong, is there another faster way to do it or it's just normal that it takes so long.

Thanks !

3
  • I dont think this is anything such as batch insert. There is a way to do something that you wish in dotnet. Look into this answer! stackoverflow.com/questions/41744582/… Commented Jun 22, 2017 at 10:36
  • My guess is that the CosmosDB Python SDK operations are all synchronous. This means that one call to client.CreateDocument() must complete its full round trip before it will go to the next document in the loop. This is incredibly inefficient. You need to get more parallelism or bigger batches in your round trips. Not sure how you do the former in Python, but the latter can be accomplished by using a stored procedure where you send in an array of JSON documents (not all 100,000, but maybe 1,000 at a time) as input to the sproc. Commented Jun 22, 2017 at 14:05
  • Another option is to bypass the CosmosDB Python SDK and make REST calls directly. Here's how you make a batch of parallel requests: stackoverflow.com/questions/9110593/…. The difficulty with this approach is usually composing the authentication token but you may be able to extract that from the Python SDK or find another SO answer that explains this. Commented Jun 22, 2017 at 14:28

1 Answer 1

1

On Azure, there are many ways to help importing data to CosmosDB faster than using PyDocumentDB API which be wrappered the related REST APIs via HTTP.

First, to be ready a json file includes your 10,000 documents for importing, then you can follow the documents below to import data.

  1. Refer to the document How to import data into Azure Cosmos DB for the DocumentDB API? to import json data file via DocumentDB Data Migration Tool.
  2. Refer to the document Azure Cosmos DB: How to import MongoDB data? to import json data file via the mongoimport tool of MongoDB.
  3. Upload the json data file to Azure Blob Storage, then to copy data using Azure Data Factory from Blob Storage to CosmosDB, please see the section Example: Copy data from Azure Blob to Azure Cosmos DB to know more details.

If you just want to import data in programming, you can try to use Python MongoDB driver to connect Azure CosmosDB to import data via MongoDB wire protocol, please refer to the document Introduction to Azure Cosmos DB: API for MongoDB.

Hope it helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.