spark scala can't find codec for java.lang.class

Question

Newbie to spark and scala. I'm trying to execute a very simple spark program via intellij-idea. All it is doing is:

Connect to a mongodb database to a specific collection
Load data
print the first record.

It was working fine, but now it is throwing the error:

org.bson.codecs.configuration.CodecConfigurationException: Can't find a codec for class java.lang.Class.

Here is my code:

import org.apache.spark.{SparkConf, SparkContext}
import com.mongodb.spark._
import com.mongodb.spark.rdd.MongoRDD
import org.bson.Document
import com.mongodb.spark.config._
import org.apache.spark.sql.SQLContext
import com.mongodb.spark.sql._
import scala.reflect.runtime.universe._

object Analytics1 {
  def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Analytics1").setMaster("local").
      set("spark.mongodb.input.uri","mongodb://192.168.56.1:27017/events.entEvent")
        .set("spark.mongodb.output.uri", "mongodb://192.168.56.1:27017/events..entResult")
val sc = new SparkContext(conf)

val rdd = sc.loadFromMongoDB()
println(rdd.first())

sc.stop()

  }
}

Here is my .sbt. If i use the latest version of spark, then it throws this error

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

so I'm using 1.6.1, which was working fine until a few days ago, but now it is throwing the

java.lang.class

error. Someone please help so that I can get moving. Since this is very basic, i'm hopeful that someone will throw some advise and get me unblocked.

Thanks.

    name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.7"

// libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"

//libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "1.6.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.6.1"

//libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.10" % "1.1.0"
libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "1.1.0"

libraryDependencies += "org.mongodb.scala" %% "mongo-scala-driver" % "1.2.1"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.1"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
resolvers += "snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/ "
resolvers += "releases"  at "https://oss.sonatype.org/content/repositories/releases/"

People looking for answers on this issue, Please read all the comments by Wan below. — Vamsi
– Vamsi, Commented Jan 6, 2017 at 18:59

Wan B. · Accepted Answer · 2017-01-05 11:17:20Z

1

libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.10" % "1.1.0"

You're loading the MongoDB Connector for Spark for Scala version 2.10. Although, your project is using Scala version 2.11.7, including the mongo-scala-driver.

Swap the line above to:

libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.11" % "1.1.0"

Alternatively use the shortcut to use given Scala version by specifying double %:

libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "1.1.0"

See more about SBT dependencies: Getting the right Scala version

answered Jan 5, 2017 at 11:17

Wan B.

18.9k4 gold badges60 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Vamsi Over a year ago

Thanks Wan. I tried both the options but no success. I even tried my mllib dependency using the shortcut with %%. That also did not help. My guess is this has something to do with dependencies and /or imports that I'm not able to figure out.

Vamsi Over a year ago

I think the problem is println. If I use rdd.toDF().show(10) instead of println(rdd.first()). It works. But the data is shown as a DF. But I want it like an RDD output. Any ideas on how to resolve this?

Wan B. Over a year ago

You need to describe the problem that you're having with error messages etc. Are you saying now the codec java.lang.class is gone but you're getting another error with println? Make a smaller script, test and debug. Post any error messages and a small reproducible script.

Vamsi Over a year ago

When I use println(rdd.first()) it throws the java.lang.class error. If I use rdd.toDF().show(1), it works fine. But the problem with this methid is that it returns the result in dataframe format. I need it in RDD format. Simpler script?, I'm just declaring the config parameters for mongo, spark context and rdd, then printing the first record. What is simpler than this?

Wan B. Over a year ago

1) Update your build.sbt on your post to reflect the changes you've done. 2) You should only need 3 dependencies for what you're trying to do. spark-core, spark-sql and mongo-spark-connector. Try it in that order. 3) Remove unnecessary imports statements. You should only need org.apache.spark.{SparkConf, SparkContext}, org.mongodb.spark._, com.mongodb.spark.config._ and org.bson.Document. 4) Do a sbt clean or remove previous builds as you still sourced the previous incorrect versions of dependencies. 5) If you're still having problem, post the exact error logs.

|

Collectives™ on Stack Overflow

spark scala can't find codec for java.lang.class

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related