0

I am trying to migrate pyspark code from jupyter notebook to python script. However when I tried to use

from pyspark.sql import SparkSession

I have got an error No module named 'pyspark'

  • I have tried to find all python3 and python2 in system, run them as a shell and have tried to import pyspark in each shell. However, I have got the same No module named 'pyspark' in each shell

  • When I tried to import findspark with python3/python2 I had got No module named 'findspark'

  • echo $PYTHONPATH and echo $SPARK_HOMEreturn empty string
  • I have tried to find all spark-submit and run my script with them instead of python3. However, I have got an error for argparse use

    File "/export/home/osvechkarenko/brdmp_10947/automation_001/py_dynamic_report.py", line 206
    if args.print:
                ^
    SyntaxError: invalid syntax
    

    When I used my script with python3 (without pyspark) it had worked fine.

1
  • Could you provide the output of worked pyspark.__file__? That helps us to identify which of your envs works. Commented Oct 30, 2018 at 2:14

1 Answer 1

1

First, make sure your python interpreter is identical for jupyter and shell via:

import sys
print(sys.executable)

If that's the case, your jupyter kernel additionally adds pyspark to python path on startup. As @Sraw pointed out, you can locate pyspark via pyspark.__file__ within your working environment.

Here is a short bash script on how pyspark can manually be added to an existing jupyter kernel under Ubuntu 16.10: link

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.