3

I am new to scala and graphx and am having problems converting a tsv file to a graph. I have a flat tab separated file like below:

n1 P1 n2 n3 P1 n4 n2 P2 n3 n3 P2 n1 n1 P3 n4 n3 P3 n2

where n1,n2,n3,n4 are the nodes of the graph and R1,P2,P3 are the properties which should form the edges between the nodes.

How can I construct a graph from the above file in SPARK GraphX ? Example code would be very helpful.

1 Answer 1

14

There is some code for you (of course you should build it in jar file using sbt):

package vinnie.pooh

import org.apache.spark.SparkContext._
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD


object Main {
  def main(args: Array[String]) {

    if (args.length != 1) {
      System.err.println(
        "Should be one parameter: <path/to/edges>")
      System.exit(1)
    }

    val conf = new SparkConf()
      .setAppName("Load graph")
      .setSparkHome(System.getenv("SPARK_HOME"))
      .setJars(SparkContext.jarOfClass(this.getClass).toList)

    val sc = new SparkContext(conf)

    val edges: RDD[Edge[String]] =
      sc.textFile(args(0)).map { line =>
        val fields = line.split(" ")
        Edge(fields(0).toLong, fields(2).toLong, fields(1))
      }

    val graph : Graph[Any, String] = Graph.fromEdges(edges, "defaultProperty")


    println("num edges = " + graph.numEdges);
    println("num vertices = " + graph.numVertices);
  }
}

and I have edge.txt:

1 Prop12 2
2 Prop24 4
4 Prop45 5
5 Prop52 2
6 Prop65 7

and then, for example, you can launch it locally:

$SPARK_HOME>./bin/spark-submit --class vinnie.pooh.Main --master local[2] ~/justBuiltJar.jar ~/edge.txt

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.