34

I am attempting to run Spark graphx with Python using pyspark. My installation appears correct, as I am able to run the pyspark tutorials and the (Java) GraphX tutorials just fine. Presumably since GraphX is part of Spark, pyspark should be able to interface it, correct?

Here are the tutorials for pyspark: http://spark.apache.org/docs/0.9.0/quick-start.html http://spark.apache.org/docs/0.9.0/python-programming-guide.html

Here are the ones for GraphX: http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html

Can anyone convert the GraphX tutorial to be in Python?

1

3 Answers 3

24

You should look at GraphFrames (https://github.com/graphframes/graphframes), which wraps GraphX algorithms under the DataFrames API and it provides Python interface.

Here is a quick example from https://graphframes.github.io/graphframes/docs/_site/quick-start.html, with slight modification so that it works

first start pyspark with the graphframes pkg loaded

pyspark --packages graphframes:graphframes:0.1.0-spark1.6

python code:

from graphframes import *

# Create a Vertex DataFrame with unique ID column "id"
v = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])

# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])
# Create a GraphFrame
g = GraphFrame(v, e)

# Query: Get in-degree of each vertex.
g.inDegrees.show()

# Query: Count the number of "follow" connections in the graph.
g.edges.filter("relationship = 'follow'").count()

# Run PageRank algorithm, and show results.
results = g.pageRank(resetProbability=0.01, maxIter=20)
results.vertices.select("id", "pagerank").show()
Sign up to request clarification or add additional context in comments.

1 Comment

You could put more explanation other than the links
21

It looks like the python bindings to GraphX are delayed at least to Spark 1.4 1.5 ∞. It is waiting behind the Java API.

You can track the status at SPARK-3789 GRAPHX Python bindings for GraphX - ASF JIRA

2 Comments

Hi, Misty do you have any idea when it will be released ? I have checked it's not available till now even on 1.5.1.
This is a terrible shame. It seems igraph-python is also partly dead. Is there any other option for handling large graphs in python?
3

GraphX 0.9.0 doesn't have python API yet. It's expected in upcoming releases.

7 Comments

So basically GraphX is a Scala-only system since it does not have a Java API either?
AFAIK it's still Scala-only
Actually I think they do have one. See here: github.com/amplab/graphx/tree/master/python/examples
The original implementation by amplab included a couple of examples, transitive closure and PageRank, but without using the actual GraphX API, just regular PySpark API. GraphX includes a lot of handy functions and classes that are not exposed yet to Python.
I just found that it's done now: github.com/kdatta/spark/tree/SPARK-3789/python/pyspark/graphx Maybe it'll be included in the next release.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.