I want to launch a Spark job on EMR Serverless from Airflow. I want to use Spark 3.3.0 and Scala 2.13 but the 6.9.0 EMR Release ships with Scala 2.12. I created a FAT jar including all Spark dependencies and it won't work either. As an alternative, I am trying to use an EMR custom image by creating an application using --image-configuration with the Airflow operator but it won't just pass through all the arguments from the boto API.
create_app = EmrServerlessCreateApplicationOperator(
task_id="create_my_app",
job_type="SPARK",
release_label="emr-6.9.0",
config={"name": "data-ingestion",
"imageConfiguration": {
"imageUri": "xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/emr-custom-image:0.0.1"}})
Airflow gives the following error message:
Unknown parameter in input: "imageConfiguration", must be one of:
name, releaseLabel, type, clientToken, initialCapacity, maximumCapacity, tags, autoStartConfiguration, autoStopConfiguration, networkConfiguration
This other config won't work either:
config={"name": "data-ingestion",
"imageUri": "xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/emr-custom-image:0.0.1"})
Does anybody have any ideas other than downgrading my Scala version?