1

I have a PySpark code/application. What is the best way to run it (utilize the maximum power of PySpark), using the python interpreter or using spark-submit?

The SO answer here was almost similar but did not explain it in great details. Would love to know, why?

Any help is appreciated. Thanks in advance.

2 Answers 2

2

Running your job on pyspark shell will always be in client mode. Where as using spark-submit you can execute it in either modes. I.e. client or cluster

Sign up to request clarification or add additional context in comments.

1 Comment

+1. Thanks for the answer. I discovered it although some days back, but this is surely a valuable answer to make the most out of spark.
1

I am assuming when you say python interpreter you are referring to pyspark shell.

You can run your spark code both ways using pySpark interpreter, using Spark-submit or even with multiple available notebooks (Jupyter/Zeppelin).

  1. When to use PySpark Interpreter.

Generally when we are learning or doing some very basic operations for an understanding or exploration purpose we use pySpark interpreter.

  1. Spark Submit.

This is usually used when you have written your entire application in pySpark and packaged into py files, so that you can submit your entire code to Spark cluster for execution.

A little analogy may help here. Let's take an example of Unix shell commands. We can execute the shell commands directly on the command prompt or we can create shell script (.sh) to execute the bunch instruction at once. Similarly, you can think of pyspark interpreter and spark-submit utility, where in pySpark interpreter you can execute individual command. However, you can package your spark application into py files and execute using spark-submit utility.

Hope this helps.

Regards,

Neeraj

3 Comments

Ahh.. Python in the sense, running it via python script.py
Yes, write you pyspark code as py file and submit the pyspark code using spark-submit utility. Like spark-submit test.py
Just an add-on, can you add some technical views on how spark-submit differs from running it on pyspark-shell?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.