0

I am trying to run the below code to create graphframe in pyspark which is setup on my local. But I am getting error. And I am using spark-2.4.0-bin-hadoop2.7 version.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#spark = SparkSession.builder.appName('fun').getOrCreate()
vertices = spark.createDataFrame([('1', 'Carter', 'Derrick', 50), 
                                  ('2', 'May', 'Derrick', 26),
                                 ('3', 'Mills', 'Jeff', 80),
                                  ('4', 'Hood', 'Robert', 65),
                                  ('5', 'Banks', 'Mike', 93),
                                 ('98', 'Berg', 'Tim', 28),
                                 ('99', 'Page', 'Allan', 16)],
                                 ['id', 'name', 'firstname', 'age'])
edges = spark.createDataFrame([('1', '2', 'friend'), 
                               ('2', '1', 'friend'),
                              ('3', '1', 'friend'),
                              ('1', '3', 'friend'),
                               ('2', '3', 'follows'),
                               ('3', '4', 'friend'),
                               ('4', '3', 'friend'),
                               ('5', '3', 'friend'),
                               ('3', '5', 'friend'),
                               ('4', '5', 'follows'),
                              ('98', '99', 'friend'),
                              ('99', '98', 'friend')],
                              ['src', 'dst', 'type'])
g = GraphFrame(vertices, edges)

I am getting the below error.

enter image description here

2 Answers 2

1

The following seems to work for me.

  1. Download the .jar file from https://spark-packages.org/package/graphframes/graphframes
  2. Since I had pyspark running on Anaconda, I added the .jar file to that path, /anaconda3/lib/python3.7/site-packages/pyspark/jars/ along with the other .jar files.
  3. Then, the following script seems to work.
# Ref: https://stackoverflow.com/a/50404308/9331359
from pyspark import SparkContext
context = SparkContext()
context.addPyFile('/anaconda3/lib/python3.7/site-packages/pyspark/jars/graphframes-0.7.0-spark2.4-s_2.11.jar')
context


# Ref: https://stackoverflow.com/a/55430066/9331359
from pyspark.sql.session import SparkSession
spark = SparkSession(context)

from pyspark.sql.types import *
from graphframes import *
Sign up to request clarification or add additional context in comments.

Comments

0

You can resolve error by implementing following steps:

1) download graphframes jar from below based on the spark version you are using ( e.g. 0.7.0-spark2.4-s_2.11 since you are using spark 2.4 version )

https://spark-packages.org/package/graphframes/graphframes

2) add downloaded graphframes jar to your spark jar e.g. $SPARK_HOME/jars

3) launch pyspark with arguments for the first time so that it downloads all the graphframe's jars dependencies:

e.g. in Windows machine , you can launch using command prompt

$SPARK_HOME/bin/pyspark --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11

4) issue below command before you run graph commands from graphframes import *

Above steps will resolve your issue

8 Comments

Thank you for reply. This is the error I am getting. Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: File file:/C:/Users/Akash%20Jain/.ivy2/jars/graphframes_graphframes-0.7.0-spark2.4-s_2.11.jar does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
any help on this ? What could be the reason. Why it is hitting my user folder. ?
Can you please paste your new code and the steps which you followed to resolve ?
my code still remains the same.. its just that I have done all these steps whatever you mentioned and then I executed the above code. Its says jar does not exist at this location...
did you include below statement in your code as I dont see it in your original code ? " from graphframes import * " Also did you appropriately execute step no. 3 mentioned in the solution steps please?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.