0

Newbie to spark and scala. I'm trying to execute a very simple spark program via intellij-idea. All it is doing is:

  1. Connect to a mongodb database to a specific collection
  2. Load data
  3. print the first record.

It was working fine, but now it is throwing the error:

org.bson.codecs.configuration.CodecConfigurationException: Can't find a codec for class java.lang.Class.

Here is my code:

import org.apache.spark.{SparkConf, SparkContext}
import com.mongodb.spark._
import com.mongodb.spark.rdd.MongoRDD
import org.bson.Document
import com.mongodb.spark.config._
import org.apache.spark.sql.SQLContext
import com.mongodb.spark.sql._
import scala.reflect.runtime.universe._

object Analytics1 {
  def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Analytics1").setMaster("local").
      set("spark.mongodb.input.uri","mongodb://192.168.56.1:27017/events.entEvent")
        .set("spark.mongodb.output.uri", "mongodb://192.168.56.1:27017/events..entResult")
val sc = new SparkContext(conf)

val rdd = sc.loadFromMongoDB()
println(rdd.first())

sc.stop()

  }
}

Here is my .sbt. If i use the latest version of spark, then it throws this error

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

so I'm using 1.6.1, which was working fine until a few days ago, but now it is throwing the

java.lang.class

error. Someone please help so that I can get moving. Since this is very basic, i'm hopeful that someone will throw some advise and get me unblocked.

Thanks.

    name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.7"

// libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"

//libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "1.6.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.6.1"

//libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.10" % "1.1.0"
libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "1.1.0"

libraryDependencies += "org.mongodb.scala" %% "mongo-scala-driver" % "1.2.1"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.1"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
resolvers += "snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/ "
resolvers += "releases"  at "https://oss.sonatype.org/content/repositories/releases/"
1
  • People looking for answers on this issue, Please read all the comments by Wan below. Commented Jan 6, 2017 at 18:59

1 Answer 1

1

libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.10" % "1.1.0"

You're loading the MongoDB Connector for Spark for Scala version 2.10. Although, your project is using Scala version 2.11.7, including the mongo-scala-driver.

Swap the line above to:

libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.11" % "1.1.0"

Alternatively use the shortcut to use given Scala version by specifying double %:

libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "1.1.0"

See more about SBT dependencies: Getting the right Scala version

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks Wan. I tried both the options but no success. I even tried my mllib dependency using the shortcut with %%. That also did not help. My guess is this has something to do with dependencies and /or imports that I'm not able to figure out.
I think the problem is println. If I use rdd.toDF().show(10) instead of println(rdd.first()). It works. But the data is shown as a DF. But I want it like an RDD output. Any ideas on how to resolve this?
You need to describe the problem that you're having with error messages etc. Are you saying now the codec java.lang.class is gone but you're getting another error with println? Make a smaller script, test and debug. Post any error messages and a small reproducible script.
When I use println(rdd.first()) it throws the java.lang.class error. If I use rdd.toDF().show(1), it works fine. But the problem with this methid is that it returns the result in dataframe format. I need it in RDD format. Simpler script?, I'm just declaring the config parameters for mongo, spark context and rdd, then printing the first record. What is simpler than this?
1) Update your build.sbt on your post to reflect the changes you've done. 2) You should only need 3 dependencies for what you're trying to do. spark-core, spark-sql and mongo-spark-connector. Try it in that order. 3) Remove unnecessary imports statements. You should only need org.apache.spark.{SparkConf, SparkContext}, org.mongodb.spark._, com.mongodb.spark.config._ and org.bson.Document. 4) Do a sbt clean or remove previous builds as you still sourced the previous incorrect versions of dependencies. 5) If you're still having problem, post the exact error logs.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.