I have a specific task to join two data streams in one aggregation using Apache Flink with some additional logic.
Basically I have two data streams: a stream of events and a stream of so-called meta-events. I use Apache Kafka as a message backbone. What I'm trying to achieve is to trigger the aggregation/window to the evaluation based on the information given in meta-event. The basic scenario is:
- The Data Stream of events starts to emit records of
Type A; - The records keep accumulating in some aggregation or window based on some key;
- The Data Stream of meta-events receives a new
meta-eventwith the given key which also defines a total amount of the events that will be emitted in the Data Stream of events. - The number of events form the step
3becomes a trigger criteria for the aggregation. After a total count ofType Aevents with a given key becomes equal to the number defined in themeta-eventwith a given key the aggregation should be triggered to the evaluation.
Steps 1 and 3 occur in the non-deterministic order, so they can be reordered.
What I've tried is to analyze the Flink Global Windows but not sure whether it would be a good and adequate solution. I'm also not sure if such problem has a solution in Apache Flink.
Any possible help is highly appreciated.