2

I am getting error on running the below code for graph creation in Spark graphX. I am running it through spark-shell by following command: ./bin/spark-shell -i ex.scala

Input:

My Vertex File looks like this (each line is a vertex of strings):
word1,word2,word3
word1,word2,word3
...
My Edge File looks like this: (edge from vertex 1 to vertex 2)
1,2
1,3

Code:

// Creating Vertex RDD (Input file has 300+ records with each record having list of strings separated by delimiter (,).
//zipWithIndex done to get an index number for all the entries - basically numbering rows
val vRDD: RDD[(VertexId, Array[String])] = (vfile.map(line => line.split(","))).zipWithIndex().map(line => (line._2, line._1))

// Creating Edge RDD using input file
//val eRDD: RDD[Edge[Array[String]]] = (efile.map(line => line.split(",")))

val eRDD: RDD[(VertexId, VertexId)] = efile.map(line => line.split(","))

// Graph creation
val graph = Graph(vRDD, eRDD)

Error:

Error:
<console>:52: error: type mismatch;
found   : Array[String]
required: org.apache.spark.graphx.Edge[Array[String]]
          val eRDD: RDD[Edge[Array[String]]] = (efile.map(line =>    line.split(",")))

<console>:57: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId,   org.apache.spark.graphx.VertexId)]
required: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[?]]
Error occurred in an application involving default arguments.
       val graph = Graph(vRDD, eRDD)
2
  • Did you build your file? It complains about the the line val eRDD: RDD[Edge[Array[String]]] = (efile.map(line => line.split(","))) which from the code above has been commented out... Commented Nov 6, 2015 at 13:22
  • But aside from that your edge RDD needs to be of type RDD[Edge] and not a tuple of VertexId (which, BTW, is a Long and not a String). You should read through the documentation spark.apache.org/docs/latest/graphx-programming-guide.html Commented Nov 6, 2015 at 13:26

2 Answers 2

1

The Edge has an attr -- what type is your attr? Let's assume it's an Int, and let's initialize it to zero:

Instead of this:

val eRDD: RDD[(VertexId, VertexId)] = efile.map(line => line.split(","))

Try this:

val eRDD: RDD[Edge[Int]] = efile.map{ line => 
  val vs = line.split(",");
  Edge(vs(0).toLong, vs(1).toLong, 0)
}
Sign up to request clarification or add additional context in comments.

Comments

0

Based on the example you gave, I created two files with vertices and edges :

val vfile = sc.textFile("vertices.txt")
val efile = sc.textFile("edges.txt")

Then you create your RDDs of vertices and edges :

val vRDD: RDD[(VertexId, Array[String])] = vfile.map(line => line.split(","))
                               .zipWithIndex()
                               .map(_.swap) // you can use swap here instead of what you are actually doing.

// Creating Edge RDD using input file
val eRDD: RDD[Edge[(VertexId, VertexId)]] = efile.map(line => {
  line.split(",", 2) match {
    case Array(n1, n2) => Edge(n1.toLong, n2.toLong)
  }
})

Once you have created your vertices and edges RDDs, you can now create your graph :

val graph = Graph(vRDD, eRDD)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.