I am having a flink streaming pipeline with Rabbitmq source,some filter, map , aggregatorFunction and windows opertors (Tumbling window with 5mins), Rabbitmq sink configured. And I'm using incremental rocksDB backend (it is stored in EFS).flink is deployed in clustered environment.
My check point size is growing gradually and not reducing. I am thinking my active keys will grow (as my keyby() uses 'date' as one of the key) in unbounded way. So i need to configure state TTL.
I was reading flink documentation (https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/), Looks like stateTTL can be configured only for processFunctions and RichFunctions(only they have open() and getRuntime methods). Is my understanding correct?
I also read that aggregateFunction itself is stateless. Along with windows it becomes stateful. Once configured window gets over, all the events are cleared. but its metadata is kept in state.
What is this meta data and when it will be cleared. Is there any way i can configure stateTTL without moving to RichFunctions?
Some of the useful link i refered:
- Cleanup configuration for ProcessWindowFunction's window state without TTL with RocksDB as backend
- https://nightlies.apache.org/flink/flink-docs-release-1.12/learn-flink/event_driven.html#example
Flink version used is 1.18
Once configured window gets over, all the events are cleared. but its metadata is kept in state. Where are you getting this information from?