0

I am having a flink streaming pipeline with Rabbitmq source,some filter, map , aggregatorFunction and windows opertors (Tumbling window with 5mins), Rabbitmq sink configured. And I'm using incremental rocksDB backend (it is stored in EFS).flink is deployed in clustered environment.

My check point size is growing gradually and not reducing. I am thinking my active keys will grow (as my keyby() uses 'date' as one of the key) in unbounded way. So i need to configure state TTL.

I was reading flink documentation (https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/), Looks like stateTTL can be configured only for processFunctions and RichFunctions(only they have open() and getRuntime methods). Is my understanding correct?

I also read that aggregateFunction itself is stateless. Along with windows it becomes stateful. Once configured window gets over, all the events are cleared. but its metadata is kept in state.

What is this meta data and when it will be cleared. Is there any way i can configure stateTTL without moving to RichFunctions?

Some of the useful link i refered:

  1. Cleanup configuration for ProcessWindowFunction's window state without TTL with RocksDB as backend
  2. https://nightlies.apache.org/flink/flink-docs-release-1.12/learn-flink/event_driven.html#example

Flink version used is 1.18

3
  • You said Once configured window gets over, all the events are cleared. but its metadata is kept in state. Where are you getting this information from? Commented May 9, 2024 at 4:04
  • I have read it in some website. I am not able to find it now. Commented May 9, 2024 at 9:34
  • From my undestanding meta data here is the state. Is my understanding correct Commented May 9, 2024 at 9:35

1 Answer 1

1

When a window fires, all the state for that key/window interval combination is removed. This includes the window timer. So if your checkpoint size really is growing continuously, something else is going on.

If your watermarks aren't progressing, then it's possible that windows aren't firing, in which case state can accumulate. This can happen when you have idle sources (or partitions in a source, typically with Kafka).

Sign up to request clarification or add additional context in comments.

5 Comments

It is a tumbling window if windows are not firing how my aggregation happens(it is nicely working)?? I have not configured anything about watermarks(default settings are used).
This can happen when you have idle sources (or partitions in a source, typically with Kafka)--> can you explain more about this?? I am using rabbitmq source and sink operators as well. My windows outputs to sink. From check point data page of UI, I can see that windows + sink operator combined together is having more check pointed data. So I am not able to get problem is with windows or sink operator.
When i checked my local RocksDB dir in taskmanager, it had only window operator folder and sst files inside. Does it mean only the window operator has state and is being added to local RocksDB dir and later it will be flushed to check point location?. Why where my other source, filter and map functions are not having any state stored? Are they also stateless in my case?
I would suggest editing your question to include full details of your workflow (as much as you can share). Though I'm leaving today for a 4+ months backpacking trip, which means I won't be able to respond, sorry!
I have updated my questions. Happy vacation :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.