3

I am trying to insert a spark sql dataframe in a remote mongodb collection. Previously I wrote a java program with MongoClient to check whether the remote collection is accessible and I was successfully able to do so.

My present spark code is as below -

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@1a8b22b5
scala> val depts = sqlContext.sql("select * from test.user_details")
depts: org.apache.spark.sql.DataFrame = [user_id: string, profile_name: string ... 7 more fields]
scala> depts.write.options(scala.collection.Map("uri" -> "mongodb://<username>:<pwd>@<hostname>:27017/<dbname>.<collection>")).mode(SaveMode.Overwrite).format("com.mongodb.spark.sql").save()

Ths is giving the following error -

java.lang.AbstractMethodError: com.mongodb.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
  at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:429)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
  ... 84 elided

I also tried the following which is throwing the below error :

scala> depts.write.options(scala.collection.Map("uri" -> "mongodb://<username>:<pwd>@<host>:27017/<database>.<collection>")).mode(SaveMode.Overwrite).save()
java.lang.IllegalArgumentException: 'path' is not specified
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$17.apply(DataSource.scala:438)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$17.apply(DataSource.scala:438)
  at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
  at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.getOrElse(ddl.scala:117)
  at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:437)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
  ... 58 elided

I have imported the following packages -

import org.apache.spark.{SparkConf, SparkContext}

import org.apache.spark.sql.SQLContext

import com.mongodb.casbah.{WriteConcern => MongodbWriteConcern}

import com.mongodb.spark.config._

import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.sql._

depts.show() is working as expected, ie. dataframe is successfully Created.

Please can someone provide me any advice/suggestion on this. Thanks

2 Answers 2

1

Assuming that you are using MongoDB Spark Connector v1.0, You can save DataFrames SQL like below:

// DataFrames SQL example 
df.registerTempTable("temporary")
val depts = sqlContext.sql("select * from test.user_details")
depts.show()
// Save out the filtered DataFrame result
MongoSpark.save(depts.write.option("uri", "mongodb://hostname:27017/database.collection").mode("overwrite"))

For more information see MongoDB Spark Connector: Spark SQL

For a simple demo of MongoDB and Spark using docker see MongoDB Spark Docker: examples.scala - dataframes

Sign up to request clarification or add additional context in comments.

Comments

0

Have a look at this error and think of possible ways to face it. That is due to a Spark version mismatch between the Spark Connector for MongoDB and Spark you use.

java.lang.AbstractMethodError: com.mongodb.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;

Quoting the javadoc of java.lang.AbstractMethodError:

Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.

That pretty much explains what you experience (note the part that starts with "this error can only occur at run time").

My guess is that the part Lorg/apache/spark/sql/Dataset in the DefaultSource.createRelation method in the stack trace is exactly the culprit.

In other words, that line uses data: DataFrame not Dataset which are incompatible in this direction, i.e. DataFrame is simply a Scala type alias of Dataset[Row], but any Dataset is not a DataFrame and hence the runtime error.

override def createRelation(sqlContext: SQLContext, mode: SaveMode, parameters: Map[String, String], data: DataFrame): BaseRelation

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.