Can we use JDBC to write data from postgresql to Spark?

Question

I am trying to load my tables on PostgreSQL to Spark. I have successfully read the table from PostgreSQL to Spark using jdbc. I have a code written in R, which I want to use on the table, but I cannot access the data in R.

using the following code to connect

 val pgDF_table = spark.read
                          .format("jdbc")
                          .option("driver", "org.postgresql.Driver")
                          .option("url", "jdbc:postgresql://10.128.0.4:5432/sparkDB")
                          .option("dbtable", "survey_results")
                          .option("user", "prashant")
                          .option("password","pandey")
                          .load()
    pgDF_table.show

is there any option as spark.write?

You may find spark.rstudio.com helpful. It allows you to define Spark jobs directly from R, rather than Scala, and bring data from Spark into R for further processing. — TroyHurts
– TroyHurts, Commented Jul 29, 2019 at 10:54
but my data is in postgresql, there's nothing related to it. — Vasudha Jain
– Vasudha Jain, Commented Jul 29, 2019 at 11:11

Pawan B · Accepted Answer · 2019-07-29 11:27:23Z

0

In SparkR,

You can read data from JDBC using the following code:

read.jdbc(url, tableName, partitionColumn = NULL, lowerBound = NULL,
  upperBound = NULL, numPartitions = 0L, predicates = list(), ...)

Arguments

`url':  JDBC database url of the form 'jdbc:subprotocol:subname'

`tableName':    the name of the table in the external database

`partitionColumn':  the name of a column of integral type that will be used for partitioning

`lowerBound':   the minimum value of 'partitionColumn' used to decide partition stride

`upperBound':   the maximum value of 'partitionColumn' used to decide partition stride

`numPartitions':    the number of partitions, This, along with 'lowerBound' (inclusive), 'upperBound' (exclusive), form partition strides for generated WHERE clause expressions used to split the column 'partitionColumn' evenly. This defaults to SparkContext.defaultParallelism when unset.

`predicates':   a list of conditions in the where clause; each one defines one partition

Data can be written to JDBC using the following code:

write.jdbc(x, url, tableName, mode = "error", ...)

Arguments

`x`: a SparkDataFrame.

`url`: JDBC database url of the form jdbc:subprotocol:subname.

`tableName`: yhe name of the table in the external database.

`mode`: one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default).

`...`: additional JDBC database connection properties.

JDBC Driver must be in spark classpath

answered Jul 29, 2019 at 11:27

Pawan B

4,6332 gold badges27 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Vasudha Jain Over a year ago

using jdbc gives me following error: Error: Error in jdbc : java.sql.SQLException: No suitable driver ERROR RBackendHandler: jdbc on 16 failed java.lang.reflect.InvocationTargetException

Pawan B Over a year ago

add postgresql jdbc driver in classpath. dataxone.com/import-export-postgresql-data-sparkr-dataframe

Vasudha Jain Over a year ago

after adding the path successcully, i get the following error when reading the file: ```ERROR: operator does not exist: character varying = integer Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.````

Vasudha Jain Over a year ago

so any idea how we can define the type cast explicitly in predicates?

Pawan B Over a year ago

@VasudhaJain: i guess you are trying to compare integer with varchar. check this ans stackoverflow.com/a/25358092/5019163

|

Collectives™ on Stack Overflow

Can we use JDBC to write data from postgresql to Spark?

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related