0

I have a main script as below

from pyspark.sql.session import SparkSession
..............
..............
..............
import callmodule as cm <<<--- This is imported from another pyspark script which is in callmod.zip file
..............
..............
..............

when I submit the spark command as below it fails with Error: No module named Callmodule

spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip mainscript.py

when I submit the spark command with driver class path(without executor class path) as below it runs successfully.

spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --driver-class-path C:\pyspark\scripts\callmod.zip mainscript.py
                                (or)
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --conf spark.driver.extraClassPath=C:\pyspark\scripts\callmod.zip mainscript.py
                                (or)
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --conf spark.driver.extraLibraryPath=C:\pyspark\scripts\callmod.zip mainscript.py

when I submit the spark command with executor class path (without driver classpath) also it runs successfully.

spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --conf spark.executor.extraClassPath=C:\pyspark\scripts\callmod.zip mainscript.py
                               (or)
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --conf spark.executor.extraLibraryPath=C:\pyspark\scripts\callmod.zip mainscript.py

Can you explain me where does the below import statement work? on driver or executor?

import callmodule as cm

Why is the code not failing with Error: No Module Named callmodule when only the driver classpath is set or only the executor classpath is set?

1 Answer 1

1

You are using --master local, so the driver is the same as the executor. Therefore, setting classpath on either driver or executor produces the same behaviour, and neither would cause an error.

Sign up to request clarification or add additional context in comments.

4 Comments

even in local mode, will not the driver and executor run on separate jvm? in that case executor wont be able to recognize driver's classpath and viceversa. isn't it?
@shankar I don't think so. In local mode there is only 1 jvm. The driver and executor are the same thing, on the same jvm.
okay. got it. So If I have to run the spark-submit job with --py-files option on a real cluster, what is the right one use --driver-class-path or executor classpath or both?
@shankar I think you'll need to specify both, so that both driver and executors can find the files.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.