I try to save a pyspark dataframe to mongodb using a google cloud dataproc cluster, but it keeps showing me an error message.
I'm using spark 2.4.7 and python 3.7, and mongoDB spark connector 2.4.3
Here is my code:
spark = SparkSession.builder\
.master("yarn")\
.appName("demo")\
.config("spark.mongodb.input.uri",
"mongodb+srv://my_host:27017/people_db") \
.config("spark.mongodb.output.uri",
"mongodb+srv://my_host:27017/people_db") \
.config('spark.jars.packages',
'org.mongodb.spark:mongo-spark-connector_2.12-2.4.3')\
.getOrCreate()
df = spark.read\
.format('csv')\
.options(header=True)\
.load(csv_path)
# ----------Some data processing -----------
df.write\ #This is the block of code that shows the error
.format("com.mongodb.spark.sql.DefaultSource")\
.mode("append")\
.option("collection", "people")\
.save()
Here is the error message:

ConnectionStringcannot be found from your classpath. I don't believe Dataproc manages MongoDB related dependencies so a conflict is unlikely. Is the same Spark application running fine on a non-Dataproc cluster? What if you add themongo-java-driverartifact from search.maven.org/remotecontent?filepath=org/mongodb/spark/… as well to your Spark packages list?