2

Is it possible to trigger checkpoint from Flink streaming job?

My use case is that: I have two streams R and S to join with tumbling time windows. The source is Kafka. I use event time processing and BoundedOutOfOrdernessGenerator to make sure events from two streams end up in the same window.

The problem is my states are large and a regular periodic checkpoint takes too much time sometimes. At first, I wanted to disable checkpointing and rely on Kafka offset. But out of orderness means I have already some data in future windows from current offset. So I need checkpointing.

If it was possible to trigger checkpoints after a window gets cleaned instead of periodic ones it would be more efficient. Maybe at evictAfter method.

Does that make sense and is it possible? IF not I'd appreciate a work around.

3
  • in the Flink environment you can try to reduce the checkpoint interval. Have you seen 1.2 release notes? ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/… Commented Jul 19, 2017 at 21:40
  • I don't see how that helps. Even if I take checkppints less frequently, they are still going to be large. I want to trigger checkpoints when I have least amount of events in operators for efficiency. Commented Jul 19, 2017 at 22:03
  • more frequently. Reducing the interval would make the checkpoints smaller Commented Sep 5, 2017 at 21:13

1 Answer 1

1

Seems the issue here is checkpoint efficiency. Consider using the RocksDB state backend with incremental checkpoints, discussed in the docs under Debugging and Tuning Checkpoints and Large State.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.