Execute spark-submit in azure databricks without creating new cluster

Question

So, here is my problem. I have a pyspark job stored in dbfs as I don't have access to databricks repo due to org policy and I cannot create a new cluster when creating a spark-submit job again due to org policy. Is their any way I can execute the pyspark job and pass parameters to it?

Check this once.

Bhavani
– Bhavani

2023-01-30 09:55:12 +00:00
Commented Jan 30, 2023 at 9:55 — Bhavani
– Bhavani, Commented Jan 30, 2023 at 9:55

Alex Ott · Accepted Answer · 2023-01-30 12:55:14Z

1

Unfortunately, Spark Submit task needs a new cluster. Depending on how your PySpark job is created, you can try following (see in the task type dropdown):

Use Python script task - it allows to get Python file from DBFS:

Use Python wheel task - if your code is packaged as wheel file

Both of these tasks are supporting execution on the existing interactive cluster, but it will cost you more.

answered Jan 30, 2023 at 12:55

Alex Ott

88.1k10 gold badges110 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

DexterMe Over a year ago

Hi Alex, thanks for your reply. The cluster that spark submit will create is temporary cluster? Does it cost less than this cluster?

Alex Ott Over a year ago

Yes, when you create a temporary cluster for job, than it's usually almost 2 times cheaper than interactive cluster (depends on the tier - standard vs. premium).

DexterMe Over a year ago

Thanks, so I tried reading the file from dbfs but when I read it into my pyspark code in notebook I get an error message as no such file or directory and then something like /local_disk0/spark-/userfiles-../dbfs:/<my folder>/<myfilename>. Why is it not finding the file?

Alex Ott Over a year ago

IT depends on how you do it…

DexterMe Over a year ago

Actually script I'm debugging is add file to hdfs cloudera nameservice path and I'm passing dbfs file path to the function, which is where its giving error.

|

Collectives™ on Stack Overflow

Execute spark-submit in azure databricks without creating new cluster

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related