I have a main script as below
from pyspark.sql.session import SparkSession
..............
..............
..............
import callmodule as cm <<<--- This is imported from another pyspark script which is in callmod.zip file
..............
..............
..............
when I submit the spark command as below it fails with Error: No module named Callmodule
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip mainscript.py
when I submit the spark command with driver class path(without executor class path) as below it runs successfully.
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --driver-class-path C:\pyspark\scripts\callmod.zip mainscript.py
(or)
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --conf spark.driver.extraClassPath=C:\pyspark\scripts\callmod.zip mainscript.py
(or)
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --conf spark.driver.extraLibraryPath=C:\pyspark\scripts\callmod.zip mainscript.py
when I submit the spark command with executor class path (without driver classpath) also it runs successfully.
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --conf spark.executor.extraClassPath=C:\pyspark\scripts\callmod.zip mainscript.py
(or)
spark-submit --master local --py-files C:\pyspark\scripts\callmod.zip --conf spark.executor.extraLibraryPath=C:\pyspark\scripts\callmod.zip mainscript.py
Can you explain me where does the below import statement work? on driver or executor?
import callmodule as cm
Why is the code not failing with Error: No Module Named callmodule when only the driver classpath is set or only the executor classpath is set?