3

I have a large mongoDB collection. I want to export this collection to CSV so I can then import it in a statistics package to do data analysis.

The collection has about 15 GB of documents in it. I would like to split the collection into ~100 equally sized CSV files. Is there any way to achieve this using mongoexport? I could also query the whole collection in pymongo, split it and write to csv files manually, but I guess this would be slower and would require more coding.

Thank you for input.

3 Answers 3

5

You can do it using --skip & --limit options.

For example, if you that your collection holds 1,000 document you can do it using a script loop (pseudo code):

loops = 100
count = db.collection.count()
batch_size = count / loops

for (i = 0; i < loops; i++) {
    mongoexport --skip (batch_size * i) --limit batch_size --out export${i}.json ...
} 

Taking into account that your documents are roughly equal in size.

Note however, that large skips are slow.

Lower bound iterations will be faster than upper bound iterations.

Sign up to request clarification or add additional context in comments.

1 Comment

10M skips take at least 12 hours (none still finished). Beware of using Mongo for any serious project.
0

Better version of above loop that does it all in parallel because you're an impatient sonnofabitch like I am:

presume we have 385892079 records, divide that by 100.

let bs=3858920 for i in {1..100} do let bsi=${bs}*$i mongoexport --db dbnamehere --collection collectionNamehere --port 3303\ --fields="f1,f2,f3" \ --out /opt/path/to/output/dir/dump.${i}.json -v \ --skip ${bsi} --limit ${bs} done

Comments

0
#total=335584
limit=20974;
skip=0;
for i in {1..16}; do mongoexport --host localhost --db tweets --collection mycollection --type=csv --fields tweet_id,user_name,user_id,text --out master_new/mongo_rec_${i}.csv -v --skip ${skip} --limit ${limit} --quiet; let skip=$((skip+limit)); done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.