2

This is more of a concept/database architecture related question. In order to maintain data consistency, instead of a NoSQL data store, I'm just storing JSON objects as strings/Text in MySQL. So a MySQL row will look like this

ID, TIME_STAMP, DATA

I'll store JSON data in the DATA field. I won't be updating any rows, instead I'll add new rows with the current time stamp. So, when I want the latest data I just fetch the row with the max(timestamp). I'm using Tornado with the Python MySQLDB driver as my primary backend application.

I find this approach very straight forward and less prone to errors. The JSON objects are fairly simple and are not nested heavily.

Is this approach fundamentally wrong ? Are there any issues with storing JSON data as Text in MySQL or should I use a file system based storage such as HDFS. Please let me know.

4
  • 2
    as long as you don't have to search within the data, you'll be fine Commented Jul 15, 2013 at 9:39
  • I recommend you to try "Redis" database. I have work with it to manage data like your example and It's very easy to use. Commented Jul 15, 2013 at 9:40
  • It is fine, however if the JSON is large and not too prolific, you nay consider using the file system. If you have many records, consider a NoSQL solution. Commented Jul 15, 2013 at 11:50
  • Thanks guys I'll look into Redis. However, it's very important that I do not lose a single byte of data. Commented Jul 15, 2013 at 13:52

3 Answers 3

8

MySQL, as you probably know, is a relational database manager. It is designed for being used in a way where data is related to each other through keys, forming relations which can then be used to yield complex retrieval of data. Your method will technically work (and be quite fast), but will probably (based on what I've seen so far) considerably impair your possibility of leveraging the technology you're using, should you expand the complexity of your scope!

I would recommend you use a database like Redis or MongoDB as they are designed for document storage rather than relational architectures.

That said, if you find the approach works fine for what you're building, just go ahead. You might face some blockers up ahead if you want to add complexity to your solution but either way, you'll learn something new! Good luck!

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Phil. I just hope that the scope doesn't get complicated :)
Feel free to upvote if you feel the answer has been helpful. Up- and down voting is a big part of what makes this site awesome!
Also keep in mind that the traditional big relational database management systems (Oracle, MySQL, MSSQL etc) have an order of magnitude of more man hours put into them and issues like intermittent data loss are seldom the cause of the software, but often easier attributed to poor configuration.
@Phil : Definitely a +1 from my side for the first para.
1

Pradeeb, to help answer your question you need to analyze your use case. What kind of data are you storing? For me, this would be the deciding factor: every technology has its specific use case where it excels at.

I think it is safe to assume that you use JSON since your data structure needs to very flexible documents, compared to a traditional relational DB. There are certain data stores that natively support such data structures, such as MongoDB (they call it "binary JSON" or BSON) as Phil pointed out. This would give you improved storage and/or improved search capabilities. Again, the utility depends entirely on your use case.

If you are looking for something like a job queue and horizontal scalability is not an issue and you just need fast access of the latest you could use RedisDB, an in-memory key value store, that has a hash (associative array) data type and lists for this kind of thing. Alternatively, since you mentioned HDFS and horizontal scalability may very well be an issue, I can recommend using queue systems like Apache ActiveMQ or RabbitMQ.

Lastly, if you are writing heavily, and your are not client limited but your data storage is your bottle neck: look into distributed, flexible-schema data storage like HBase or Cassandra. They offer flexible data schemas, are heavily write optimized, and data can be appended and remains in chronological order, so you can fetch the newest data efficiently.

Hope that helps.

1 Comment

Thanks. I have used Mongo before but write consistency is an issue with Mongo and people have lost data.
1

This is not a problem. You can also use memcached storage engine in modern MySQL which would be perfect. Although I have never tried that.

Another approach is to use memcached as cache. Write everything to both memcached, and also mysql. When you go to read data, try reading from memcached. If it does not exist, read from mysql. This is a common technique to reduce database bottleneck.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.