1

Description:

  1. Background: I am new to Flink and have no prior experience with big data. My knowledge of Java or Linux is also limited.

  2. Requirement: I want to use Flink CDC to perform a simple data table synchronization test between SQL Server databases, Specifically, using yaml files instead of java code

  3. Environment Information:

name version
flink flink-1.19.1
Flink CDC flink-cdc-3.1.0
SQL Server SQL Server 2022
JDK OpenJDK 11.0.23
Operating system centos7.9

Steps and Issues Encountered:

  1. first conduct environmental verification,
  • using the flink sql client ,executing create table and insert into table commands, successfully synchronizing data from TEST_FOR_FLINK.dbo.orders to DW.dbo.orders_dw.
  1. Then I followed the documentation here :

  2. Extract the following file to the corresponding directory of /flink-1.19.1:

  • flink-cdc-3.1.0-bin.tar.gz
  1. Copy the following files to /flink-1.19.1/lib/:
  • flink-connector-jdbc-3.1.2-1.18.jar
  • mssql-jdbc-12.6.3.jre11.jar
  • flink-sql-connector-sqlserver-cdc-3.1.0.jar
  1. Copy the following file mssql-to-mssql-test01.yaml to /flink-1.19.1/pipeline/.
source:
  type: sqlserver-cdc
  hostname: x.x.x.x
  port: 1433
  username: sa
  password: xxxxxx
  database: TEST_FOR_FLINK
  tables: dbo.orders
  server-time-zone: UTC

sink:
  type: jdbc
  driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
  url: jdbc:sqlserver://x.x.x.x:1433;databaseName=DW
  username: sa
  password: xxxxxx
  table-name: dbo.orders_dw

pipeline:
  name: Sync MSSQL Database to MSSQL
  parallelism: 2
  1. Execute the following command:
[root@master pipeline]# flink-cdc.sh mssql-to-mssql-test01.yaml

Error Message:


Exception in thread "main" java.lang.RuntimeException: Cannot find factory with identifier "sqlserver-cdc" in the classpath.

Available factory classes are:

        at org.apache.flink.cdc.composer.utils.FactoryDiscoveryUtils.getFactoryByIdentifier(FactoryDiscoveryUtils.java:62)
        at org.apache.flink.cdc.composer.flink.translator.DataSourceTranslator.translate(DataSourceTranslator.java:47)
        at org.apache.flink.cdc.composer.flink.FlinkPipelineComposer.compose(FlinkPipelineComposer.java:101)
        at org.apache.flink.cdc.cli.CliExecutor.run(CliExecutor.java:71)
        at org.apache.flink.cdc.cli.CliFrontend.main(CliFrontend.java:71)

I hope someone can point out what I did wrong. The official documentation does not provide an ETL example for SQL Server to SQL Server use yaml file. If YAML files cannot be used for data transfer, where can I find documents or java project demo to use Java + Table API for data synchronization?

1 Answer 1

1

sqlserver-cdc pipeline connector is not supported yet, you can wait the PR https://github.com/apache/flink-cdc/pull/3445

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.