Saving a pyspark dataframe to mongodb gives an error

Question

I try to save a pyspark dataframe to mongodb using a google cloud dataproc cluster, but it keeps showing me an error message. I'm using spark 2.4.7 and python 3.7, and mongoDB spark connector 2.4.3 Here is my code:

spark = SparkSession.builder\
                    .master("yarn")\
                    .appName("demo")\
                    .config("spark.mongodb.input.uri",
                             "mongodb+srv://my_host:27017/people_db") \
                    .config("spark.mongodb.output.uri",
                            "mongodb+srv://my_host:27017/people_db") \
                    .config('spark.jars.packages',
                            'org.mongodb.spark:mongo-spark-connector_2.12-2.4.3')\
                    .getOrCreate()
df = spark.read\
          .format('csv')\
          .options(header=True)\
          .load(csv_path)

# ----------Some data processing -----------

df.write\    #This is the block of code that shows the error
  .format("com.mongodb.spark.sql.DefaultSource")\
  .mode("append")\
  .option("collection", "people")\
  .save()

Here is the error message:

The error is saying the class ConnectionString cannot be found from your classpath. I don't believe Dataproc manages MongoDB related dependencies so a conflict is unlikely. Is the same Spark application running fine on a non-Dataproc cluster? What if you add the mongo-java-driver artifact from search.maven.org/remotecontent?filepath=org/mongodb/spark/… as well to your Spark packages list? — cyxxy
– cyxxy, Commented Jun 23, 2021 at 1:20
Thank you so much @cyxxy for your help, I added the mongo java driver jar file to spark packages list and it works very well — Walid
– Walid, Commented Jun 24, 2021 at 8:34

akash masne · Accepted Answer · 2022-02-12 12:13:32Z

0

The mongo driver jar is not included in the class path. The two mongo jars (connector and driver) are essential in spark/jars path. I was able to run on local and also as dataproc job by referring to the below link. Mongo connector : 2.12_3.0.1 Mongo java driver : 3.12 Spark : 3.0.2

Mongo dependencies required

edited Feb 12, 2022 at 12:13

answered Feb 8, 2022 at 4:12

akash masne

12 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Community Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

rikyeah Over a year ago

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review

Collectives™ on Stack Overflow

Saving a pyspark dataframe to mongodb gives an error

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related