Multiple instances of my multi-threaded(approx 10 threads) application is running on different machines(approx 10 machines). So overall 100 threads of this application are active simultaneously. Each of these threads produce 4 output sets, each set containing 1k-5k rows. Each of these sets is pushed to a single Mysql machine , same db, same table(insert or update operation). So there are 4 tables consuming 4 sets produced by each thread. I am using mybatis as ORM. These threads may consume a lot of time in writing output to DB than processing the requests. How can I optimize the database writes in this case? 1. Use batch processing of mybatis 2. Write data to files which will be picked up by single consumer thread & written into DB? 3. Write each data set to different files & use 4 consumer threads to pick data from same set that must be pushed to same table, so locking is minimized? Please suggest other better ways if possible?
1 Answer
Databases are made to handle concurrency. Not sure what exactly mybatis brings into the picture (not a huge fan of ORM in general), but if it is using it, that makes you start thinking about hacks like intermediate files and single-threaded updates, you are probably much better off ripping it out and writing to db with plain jdbc, which should have no problem handling your use case, provided, you batch your updates adequately.
4 Comments
Ajay Bodhe
Can you elaborate what do you mean by batch your updates properly.
Dima
I mean if a thread is updating, say, 50 rows per second, or if it is updating the same row multiple times in a short period of time, then you should accumulate those updates, rather than making a separate sql call each time a row needs to be updated, which is what it sounds like you are doing, because your db updates are taking more time than data processing. Also inspect your schema for inefficiencies. It is very suspicious that your updates take this long. Are you using a connection pool? You should. Check it's configuration too.
Ajay Bodhe
Thanks. I am using a separate DB connection from each thread. Is there a better way to handle connections? Again a naive doubt, if the connections are not going to be more than 200-500 then will connection pooling matter?
Dima
As long as you keep the connection open, and don't recreate it every time you talk to the db, it is, probably, fine.