1

Objective

Understand the cause and the solution to the problem. The problem happens when using spark-submit. Appreciate the help.

spark-submit --class "AuctionDataFrame" --master spark://<hostname>:7077 auction-project_2.11-1.0.jar

It does not cause an error when running line by line in a spark-shell.

...
scala>     val auctionsDF = auctionsRDD.toDF()
auctionsDF: org.apache.spark.sql.DataFrame = [aucid: string, bid: float, bidtime: float, bidder: string, bidrate: int, openbid: float, price: float, itemtype: string, dtl: int]
scala> auctionsDF.printSchema()
root
 |-- aucid: string (nullable = true)
 |-- bid: float (nullable = false)
 |-- bidtime: float (nullable = false)
 |-- bidder: string (nullable = true)
 |-- bidrate: integer (nullable = false)
 |-- openbid: float (nullable = false)
 |-- price: float (nullable = false)
 |-- itemtype: string (nullable = true)
 |-- dtl: integer (nullable = false)

Problem

Calling toDF method to convert RDD into DataFrame causes the error.

Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
    at AuctionDataFrame$.main(AuctionDataFrame.scala:52)
    at AuctionDataFrame.main(AuctionDataFrame.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Code

case class Auctions(
  aucid: String,
  bid: Float,
  bidtime: Float,
  bidder: String,
  bidrate: Int,
  openbid: Float,
  price: Float,
  itemtype: String,
  dtl: Int)

object AuctionDataFrame {
  val AUCID = 0
  val BID = 1
  val BIDTIME = 2
  val BIDDER = 3
  val BIDRATE = 4
  val OPENBID = 5
  val PRICE = 6
  val ITEMTYPE = 7
  val DTL = 8

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("AuctionDataFrame")
    val sc = new SparkContext(conf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._

    val inputRDD = sc.textFile("/user/wynadmin/auctiondata.csv").map(_.split(","))
    val auctionsRDD = inputRDD.map(a =>
      Auctions(
        a(AUCID),
        a(BID).toFloat,
        a(BIDTIME).toFloat,
        a(BIDDER),
        a(BIDRATE).toInt,
        a(OPENBID).toFloat,
        a(PRICE).toFloat,
        a(ITEMTYPE),
        a(DTL).toInt))
    val auctionsDF = auctionsRDD.toDF()  // <--- line 52 causing the error.
}

build.sbt

name := "Auction Project"

version := "1.0"

scalaVersion := "2.11.8"
//scalaVersion := "2.10.6"

/* 
libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2",
    "org.apache.spark" %% "spark-sql" % "1.6.2",
    "org.apache.spark" %% "spark-mllib" % "1.6.2"
)
*/

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)

Environment

Spark on Ubuntu 14.04:

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.2
      /_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)

sbt on Windows:

D:\>sbt sbtVersion
[info] Set current project to root (in build file:/D:/)
[info] 0.13.12

Research

Looked into similar issues which suggest Scala version incompatibility that compiled Spark.

Hence changed the Scala version in build.sbt to 2.10 which created 2.10 jar, but the error persisted. Using % provided or not does not change the error.

scalaVersion := "2.10.6"
3
  • 3
    Possible duplicate of Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror Commented Sep 14, 2016 at 7:11
  • Still looks like a version issue. Check carefully what version's being used everywhere... Commented Sep 14, 2016 at 7:25
  • @TzachZohar, thanks for the comment but I changed the Scala version to "2.10.6", run "sbt clean" and the "sbt package" again, which did not solve the issue. Could you be more specific how the linked article can solve? Commented Sep 14, 2016 at 23:27

1 Answer 1

0

Cause

The Spark 1.6.2 was compiled from source files with Scala 2.11. However the spark-1.6.2-bin-without-hadoop.tgz was downloaded and placed in lib/ directory.

I believe because the spark-1.6.2-bin-without-hadoop.tgz has been compiled with Scala 2.10, it cause the compatibility issue.

Fix

Remove the spark-1.6.2-bin-without-hadoop.tgz from the lib directory and run "sbt package" with library dependencies below.

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.