0

If i had to develop a

  1. Core Java application which processes CSV files and stored output in a Open-source DB
  2. Data size would be 10 GB initially (porting from existing sources)
  3. Would grow at 1 GB per month
  4. A typical transaction could fetch 100,000 rows
  5. Could be accessed by 1000 users at a given time

And had choice of

  1. Mongodb
  2. MySQL
  3. PostGresql

which would be the best choice of DB ?

This compares MongoDB with MySQL

This compares PostgreSQL to MySQL

Security alerts for MongoDB

6
  • 2
    This is off-topic because you are asking for a tool recommendation and the subject is too broad and the choice is heavily a matter of opinion. I'm curious which of these reasons will be the final one chosen. Commented Apr 9, 2015 at 15:34
  • 2
    Answer to the Ultimate Question of Life, the Universe, and Everything: 42 Commented Apr 9, 2015 at 16:06
  • @FrankHeikens: +1 if you know the question. ;) Commented Apr 9, 2015 at 16:24
  • 1
    Do you have a data-structure or not? There are some important correlation between the entries? The data will only increase? Commented Apr 9, 2015 at 20:19
  • 1
    Tip: if all the CSV files share the same structure, you probably want a relational DB. If all the CSV files are different and unpredictable structures, think about other storage options like schemaless. Which is best? Good luck with that, if there was a single definitive answer then there wouldn't be multiple products available. You also forgot Cassandra, Firebird, SQLite, HSQLDB, Derby, BDB, Redis, .... Commented Apr 10, 2015 at 1:40

1 Answer 1

2

With increasing data it's better to have a DB that scale easly and SQL doesn't scale smoothly and eventually breaks doing it, in fact usually for Big Data only High scalable DB are used. But you said that entries can have correlation with each other so in this case it's better to use a relational DB because the NO-SQL ones can "lose" some correlation. Like @Craig Ringer said don't consider only those DBs there are a lot of different solutions who has their own pros and cons ( for example redis is very very fast but it's almost without any kind of complex logic because it's a simple Key-Value storage, or Cassandra is faster than Mongo but works better with schemed data, Mongo is a documental DB so can store any kind of data in the same Collection ).

IMHO you should try to set up some bench marking sessions with different DB and Use case and focus on what you want to be done fast and then choose the better in that field.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.