0

I have a specific task to join two data streams in one aggregation using Apache Flink with some additional logic.

Basically I have two data streams: a stream of events and a stream of so-called meta-events. I use Apache Kafka as a message backbone. What I'm trying to achieve is to trigger the aggregation/window to the evaluation based on the information given in meta-event. The basic scenario is:

  1. The Data Stream of events starts to emit records of Type A;
  2. The records keep accumulating in some aggregation or window based on some key;
  3. The Data Stream of meta-events receives a new meta-event with the given key which also defines a total amount of the events that will be emitted in the Data Stream of events.
  4. The number of events form the step 3 becomes a trigger criteria for the aggregation. After a total count of Type A events with a given key becomes equal to the number defined in the meta-event with a given key the aggregation should be triggered to the evaluation.

Steps 1 and 3 occur in the non-deterministic order, so they can be reordered.

What I've tried is to analyze the Flink Global Windows but not sure whether it would be a good and adequate solution. I'm also not sure if such problem has a solution in Apache Flink.

Any possible help is highly appreciated.

1 Answer 1

2

The simplistic answer is to .connect() the two streams, keyBy() the appropriate fields in each stream, and then run them into a custom KeyedCoProcessFunction. You'd save the current aggregation result & count in the left hand (Type A) stream state, and the target count in the right hand (meta-event) stream state, and generate results when the aggregation count == the target count.

But there is an issue here - what happens if you get N records in the Type A stream before you get the meta-event record for that key, and N > the target count? Essentially you either have to guarantee that doesn't happen, or you need to buffer Type A events (in state) until you get the meta-event record.

Though similar situations could occur if the meta-event target can be changed to a smaller value, of course.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi @kkruglet, thank you! In my case the equality of the target number of events in meta is guaranteed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.