I am using a Cloud Composer environment to run workflows in a GCP project. One of my workflows creates a Dataproc cluster in different project using the DataprocClusterCreateOperator, and then attempts to submit a PySpark job to that cluster using the DataProcPySparkOperator from the airflow.contrib.operators.dataproc_operator module.
To create the cluster, I can specify a project_id parameter to create it in another project, but it seems like DataProcPySparkOperator ignores this parameter. For example, I expect to be able to pass a project_id, but I end up with a 404 error when the task runs:
from airflow.contrib.operators.dataproc_operator import DataProcPySparkOperator
t1 = DataProcPySparkOperator(
project_id='my-gcp-project',
main='...',
arguments=[...],
)
How can I use DataProcPySparkOperator to submit a job in another project?