2

I have a application in Java in which I need to use multi-threading. I have a list of ID's which is primary key for tables stored in DynamoDB.

Say, the list is :

| ID_1 | ID_2 | ID_3 | ID_4|.......| ID_n|

Now I want multiple threads to read these ID's and do the following for each ID:

  1. Each thread should take a ID and query DynamoDB tables (there are two dynamo DB tables for which ID is the primary key)

  2. The result of querying each Dynamo DB table should be stored in a separate file.

Essentially, Thread_1 should pick up a ID say ID_1, it should query DynamoDB tables DDB_1 and DDB_2. The result of querying DDB_1 should go in File1 and result of DDB_2 should go in File_2. This needs to be done for all the threads. Finally, when all threads have completed execution I should have two files File_1 and File_2 containing results of query from all the threads.

I have come up with a solution that let all producer threads (threads which get the query results from Dynamo DB) queue the results of the query to a single consumer thread which writes to a file say File_1. Similarly all producer threads write to a second queue and a second consumer thread writes to File_2.

Do you feel any flaw in the approach above? Is there a better way to apply multi-threading in this case?

2
  • Please post your code here...try to post MCVE Commented May 3, 2016 at 11:55
  • I am currently deciding on the design..would start implementing it once a possible solution is found. Need to check whether this is a feasible solution or there is something better which can be tried out! Commented May 3, 2016 at 12:09

3 Answers 3

1

This is what you want to achieve:-

ID_1 -> Thread1 -> Query DB1 ->  ConsumerSingleton -> Write data to File 1
                -> Query DB2 ->  ConsumerSingleton -> Write data to File 2
ID_2 -> Thread2 -> Query DB1 ->  ConsumerSingleton -> Write data to File 1
                -> Query DB2 ->  ConsumerSingleton -> Write data to File 2

ID_3 -> Thread3 -> Query DB1 ->  ConsumerSingleton -> Write data to File 1
                -> Query DB2 ->  ConsumerSingleton -> Write data to File 2
..
..  
ID_N -> ThreadN -> Query DB1 ->  ConsumerSingleton -> Write data to File 1
                -> Query DB2 ->  ConsumerSingleton -> Write data to File 2

Since you are using single consumer object you don't have to take care of synchronize write operation of file1 & file2. However you have to synchronize the operation/method where your threads will be dumping the result to consumer's collection. You can use ConcurrentHashMap to collect the results from different threads in your consumer class which is thread safe.

Also, since you are going to read rows from DB1 and DB2 based on unique id's row level lock should not happen while multiple thread tries to access. If this is not the case and 2 thread tries to read row with same ID contention can happen.

Sign up to request clarification or add additional context in comments.

Comments

1

If i understand right, you want 2 Threads that each query a db-table and post the results in a file. See under.

APPLICATION
|
|-->THREAD --> DB_1 --> file1
|
|-->THREAD --> DB_2 --> file2

First off this should be perfectly fine, you are not reading and writing to/from the same data, meaning this is threadsafe. The way you want to do this is making a class for each Thread(just an example). Do this by extending runnable. Then place all the code for connection to a DB in the run method. Long example: http://www.tutorialspoint.com/java/java_multithreading.htm

Short example

class Thread1 implements Runnable {

    public void run() {
        Connect/write
    } 
}

Call by using

Thread1 t = new Thread1();
t.start();

This should work fine as long as you are not editing the ID's while you are reading them in one of these Threads.

Using synchronized

This locks a method to a single Thread, for example when writing to the same file this is necessary as the Threads will interupt each other.

public synchronized void write(text, file1, file2){

}

Call this like a normal method in your Threads. This does NOT guarantee the order in which the Threads access these methods, in this example it's first come first serve.

2 Comments

I think you misunderstood me. I want k threads running, each thread querying two dynamo db tables and publishing results in two separate files, each corresponding to a DDB table. So, my Thread_1 writes to two files, FILE1 and FILE2, Thread_2 also writes to these 2 files.....Thread_k also writes to these files....What should be the approach then ? Thanks
If you want to do this you whould have to synchronize your writing to the files, this by just adding synchronized to the method you call to write to the files. If you don't do this your threads will interupt each other during the writing, also note this might not give results in the order you expect. This is assuming the following situation. APPLICATION | |-->THREAD --> DB_1 --> SYNCHRONIZED method(file1, file2) | |-->THREAD --> DB_2 --> SYNCHRONIZED method(file1, file2)
0

Do you feel any flaw in the approach above?

I can't spot one. But of course, I can only comment based on your high-level description of your algorithm. There will be right and wrong ways to implement it.

Is there a better way to apply multi-threading in this case?

It is hard to say. But I can't think of any alternative that is obviouly better. There are (no doubt) alternatives, but the only way you could objectively determine which is best1 would be to implement various alternatives and benchmark them.

Note that the bottlenecks for this application are likely to be:

  • the effective throughput of your DynamoDB queries
  • the rate at which you can write the results to file

(Probably, the former will dominate.) Since both are going to be limited by "external" factors (e.g. disc I/O, networking, load on the database CPUs) you will most likely need to "tune" the number of worker threads you use.


1 - I assume you mean the one that has the best throughput.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.