0

My team migrated to Airbyte at the start of 2024 and mid-year, we started using the CDC capability of SQL Server.

However, one of the jobs has started failing again on a java heap space error. The CDC _CT table had approx. 91M rows at the time of investigation and the log file retention period on the DB is 3 days. Important point to note is the job, containing only one table, runs fine syncing about 2M rows several times a day.

However, once a month when a month-end process kicks off and initiates a large change on the table, the job starts failing.

This is our current values.yml configuration:

global:
  edition: "community"
 
  jobs:
    resources:
      limits:
        cpu: 1000m
        memory: 12Gi ## e.g. 500m
      requests:
        cpu: 500m
        memory: 2Gi
 
  env_vars:
    HTTP_IDLE_TIMEOUT: 1800s
    DEBEZIUM_MAX_QUEUE_SIZE_IN_BYTES: 536870912
    #LOG_LEVEL: DEBUG
    CDC_LOG_LEVEL: DEBUG
    #DEBEZIUM_LOG_LEVEL: DEBUG
    MSSQL_CDC_LOG_LEVEL: DEBUG
    JOB_MAIN_CONTAINER_MEMORY_REQUEST: 2Gi
    JOB_MAIN_CONTAINER_MEMORY_LIMIT: 15Gi
    NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_REQUEST: 2Gi
    NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_LIMIT: 8Gi
    JAVA_OPTS: "-XX:+ExitOnOutOfMemoryError -XX:MaxRAMPercentage=80.0 -XX:+UseG1GC"
 
webapp:
  ingress:
    annotations:
      kubernetes.io/ingress.class: internal
      nginx.ingress.kubernetes.io/proxy-body-size: 16m
      nginx.ingress.kubernetes.io/proxy-send-timeout: 1800
      nginx.ingress.kubernetes.io/proxy-read-timeout: 1800
 
airbyte-bootloader:
  resources:
      limits:
        cpu: 1000m
        memory: 5Gi ## e.g. 500m
      requests:
        cpu: 500m
        memory: 1Gi
 
worker:
  enabled: true
  # -- Number of worker replicas
  replicaCount: 1
 
  image:
    # -- The repository to use for the airbyte worker image.
    repository: airbyte/worker
    # -- the pull policy to use for the airbyte worker image
    pullPolicy: IfNotPresent
 
  ## worker resource requests and limits
  ## ref: http://kubernetes.io/docs/user-guide/compute-resources/
  ## We usually recommend not to specify default resources and to leave this as a conscious
  ## choice for the user. This also increases chances charts run on environments with little
  ## resources, such as Minikube. If you do want to specify resources, uncomment the following
  ## lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  resources:
    #! -- The resources limits for the worker container
    limits:
      memory: 5Gi
      cpu: 500m
    # -- The requested resources for the worker container
    requests:
       memory: 1Gi
       cpu: 250m

Related GitHub issue raised on Nov 05 - https://github.com/airbytehq/airbyte/discussions/48348?sort=new

1 Answer 1

0

This doesn't seem to have anything to do with SQL server CDC as such, but more that the JVM heap space is insufficient for the volume of data the Airbyte worker is attempting to process.

I haven't used Airbyte but heap space is a configurable option at the JVM level. The values.yml file shown is setting the JVM heap space to a size equal to 80% of the available RAM (-XX:MaxRAMPercentage=80.0).

I'm guessing that this means the JVM has access to 80% of the memory configured for the worker container which if I am understaing the configuration file could be as little as 80% of 1 Gi (i.e. 858 MiB).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.