for my current development I have many threads (Producers) that create Tasks and many threads that that consume these Tasks (consumers)
Each Producers is identified by a unique name; A Tasks is made of:
- the name of its
Producers - a name
- data
My question concerns the data structure used by the (Producers) and the (consumers).
Concurrent Queue?
Naively, we could imagine that Producers populate a concurrent-queue with Tasks and (consumers) reads/consumes the Tasks stored in the concurrent-queue.
I think that this solution would rather well scale but one single case is problematic: If a Producers creates very quickly two Tasks having the same name but not the same data (Both tasks T1 and T2 have the same name but T1 has data D1 and T2 has data D2), it is theoretically possible that they are consumed in the order T2 then T1!
Task Map + Queue?
Now, I imagine creating my own data structure (let's say MyQueue) based on Map + Queue. Such as a queue, it would have a pop() and a push() method.
- The
pop()method would be quite simple - The
push()method would:- Check if an existing
Taskis not yet inserted inMyQueue(doingfind()in the Map)- if found: data stored in the
Taskto-be-inserted would be merged with data stored in the foundTask - if not found: the
Taskwould be inserted in the Map and an entry would be added in the Queue
- if found: data stored in the
- Check if an existing
Of course, I'll have to make it safe for concurrent access... and that will certainly be my problem; I am almost sure that this solution won't scale.
So What?
So my question is now what are the best data structure I have to use in order to fulfill my requirements
ForkJoinPool. Also note that "increase the number of Consumers" will usually decrease throughput, not increase it (except you run them on a different machine).