No module named graphframes Jupyter Notebook

Question

I'm following this installation guide but have the following problem with using graphframes

from pyspark import SparkContext
sc =SparkContext()
!pyspark --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11
from graphframes import *

--------------------------------------------------------------------------- ImportError Traceback (most recent call last) in () ----> 1 from graphframes import *

ImportError: No module named graphframes

I'm not sure wether it is possible to install package on the following way. But I'll appreciate your advice and help.

This might help: community.hortonworks.com/questions/61386/… — PHA
– PHA, Commented May 11, 2018 at 6:30
Looks like a nice workaround. Definitely, try, but I guess there should be the more general solution. — Daniel Chepenko
– Daniel Chepenko, Commented May 11, 2018 at 6:37
scala> util.Properties.versionNumberString res0: String = 2.12.4 — Daniel Chepenko
– Daniel Chepenko, Commented May 11, 2018 at 13:31

Pranav A · Accepted Answer · 2018-05-18 05:26:42Z

13

Good question!

Open up your bashrc file, and type export SPARK_OPTS="--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11". Once you saved your bashrc file, close it and type source .bashrc.

Finally, open up your notebook and type:

from pyspark import SparkContext
sc = SparkContext()
sc.addPyFile('/home/username/spark-2.3.0-bin-hadoop2.7/jars/graphframes-0.5.0-spark2.1-s_2.11.jar')

After that, you may able to run it.

answered May 18, 2018 at 5:26

Pranav A

1981 gold badge2 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sida Zhou · Accepted Answer · 2019-03-02 02:13:53Z

5

I'm using jupyter notebook in docker, trying to get graphframes working. First, I used the method in https://stackoverflow.com/a/35762809/2202107, I have:

import findspark
findspark.init()
import pyspark
import os

SUBMIT_ARGS = "--packages graphframes:graphframes:0.7.0-spark2.4-s_2.11 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS

conf = pyspark.SparkConf()
sc = pyspark.SparkContext(conf=conf)
print(sc._conf.getAll())

Then by following this issue, we finally are able to import graphframes: https://github.com/graphframes/graphframes/issues/172

import sys
pyfiles = str(sc.getConf().get(u'spark.submit.pyFiles')).split(',')
sys.path.extend(pyfiles)
from graphframes import *

answered Mar 2, 2019 at 2:13

Sida Zhou

3,7653 gold badges38 silver badges51 bronze badges

2 Comments

Sida Zhou Over a year ago

Which is better than the answer by @Pranav A, since I don't need to go fiddle manually with paths.

ednaMode Over a year ago

Thank you! this worked for me -- can anyone help me understand what the commands here doing that make it work?

Alex Ortner · Accepted Answer · 2019-09-27 06:58:43Z

2

The simplest way is to start jupyter with pyspark and graphframes is to start jupyter out from pyspark.

Just open your terminal and set the two environment variables and start pyspark with the graphframes package

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
pyspark --packages graphframes:graphframes:0.6.0-spark2.3-s_2.11

the advantage of this is also that if you later on want to run your code via spark-submit you can use the same start command

answered Sep 27, 2019 at 6:58

Alex Ortner

1,2381 gold badge11 silver badges26 bronze badges

Comments

Adam Klein · Accepted Answer · 2021-11-30 14:53:01Z

I went through a long painful road to find a solution that works here.

I am working with the native jupyter server within VS code. In there, I created a .env file:

SPARK_HOME=/home/adam/projects/graph-algorithms-book/spark-3.2.0-bin-hadoop3.2
JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
PYSPARK_SUBMIT_ARGS="--driver-memory 2g --executor-memory 6g --packages graphframes:graphframes:0.8.2-spark3.2-s_2.12 pyspark-shell"

Then in my python notebook I have something that looks like the following:

from pyspark.sql.types import *
from graphframes import *

from pyspark.sql.session import SparkSession
spark = SparkSession.builder.appName('GraphFrames').getOrCreate()

You should see the code to print out and fetch the dependencies accordingly. Something like this:

:: loading settings :: url = jar:file:/home/adam/projects/graph-algorithms-book/spark-3.2.0-bin-hadoop3.2/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/adam/.ivy2/cache
The jars for the packages stored in: /home/adam/.ivy2/jars
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-96a3a1f1-4ea4-4433-856b-042d0269ec1a;1.0
    confs: [default]
    found graphframes#graphframes;0.8.2-spark3.2-s_2.12 in spark-packages
    found org.slf4j#slf4j-api;1.7.16 in central
:: resolution report :: resolve 174ms :: artifacts dl 8ms
    :: modules in use:
    graphframes#graphframes;0.8.2-spark3.2-s_2.12 from spark-packages in [default]
    org.slf4j#slf4j-api;1.7.16 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   2   |   0   |   0   |   0   ||   2   |   0   |
    ---------------------------------------------------------------------

after that I was able to create some code with the relationships:

v = spark.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])

It should work fine. Just remember to align all your pyspark versions. I had to install the proper versions of graphframes from a forked repo. The PiPy install is behind on versions so I had to use the PHPirates repo to do the proper install. Here the graphframes has been compiled for version 3.2.0 of pyspark.

pip install "git+https://github.com/PHPirates/[email protected]#egg=graphframes&subdirectory=python"
pip install pyspark==3.2.0

Collectives™ on Stack Overflow

No module named graphframes Jupyter Notebook

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related