1

So, here is my problem. I have a pyspark job stored in dbfs as I don't have access to databricks repo due to org policy and I cannot create a new cluster when creating a spark-submit job again due to org policy. Is their any way I can execute the pyspark job and pass parameters to it?

1
  • Check this once. Commented Jan 30, 2023 at 9:55

1 Answer 1

1

Unfortunately, Spark Submit task needs a new cluster. Depending on how your PySpark job is created, you can try following (see in the task type dropdown):

enter image description here

  • Use Python script task - it allows to get Python file from DBFS:

enter image description here

  • Use Python wheel task - if your code is packaged as wheel file

Both of these tasks are supporting execution on the existing interactive cluster, but it will cost you more.

Sign up to request clarification or add additional context in comments.

6 Comments

Hi Alex, thanks for your reply. The cluster that spark submit will create is temporary cluster? Does it cost less than this cluster?
Yes, when you create a temporary cluster for job, than it's usually almost 2 times cheaper than interactive cluster (depends on the tier - standard vs. premium).
Thanks, so I tried reading the file from dbfs but when I read it into my pyspark code in notebook I get an error message as no such file or directory and then something like /local_disk0/spark-/userfiles-../dbfs:/<my folder>/<myfilename>. Why is it not finding the file?
IT depends on how you do it…
Actually script I'm debugging is add file to hdfs cloudera nameservice path and I'm passing dbfs file path to the function, which is where its giving error.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.