0

I am trying to load my tables on PostgreSQL to Spark. I have successfully read the table from PostgreSQL to Spark using jdbc. I have a code written in R, which I want to use on the table, but I cannot access the data in R.

using the following code to connect

 val pgDF_table = spark.read
                          .format("jdbc")
                          .option("driver", "org.postgresql.Driver")
                          .option("url", "jdbc:postgresql://10.128.0.4:5432/sparkDB")
                          .option("dbtable", "survey_results")
                          .option("user", "prashant")
                          .option("password","pandey")
                          .load()
    pgDF_table.show 

is there any option as spark.write?

2
  • You may find spark.rstudio.com helpful. It allows you to define Spark jobs directly from R, rather than Scala, and bring data from Spark into R for further processing. Commented Jul 29, 2019 at 10:54
  • but my data is in postgresql, there's nothing related to it. Commented Jul 29, 2019 at 11:11

1 Answer 1

0

In SparkR,

You can read data from JDBC using the following code:

read.jdbc(url, tableName, partitionColumn = NULL, lowerBound = NULL,
  upperBound = NULL, numPartitions = 0L, predicates = list(), ...)

Arguments

`url':  JDBC database url of the form 'jdbc:subprotocol:subname'

`tableName':    the name of the table in the external database

`partitionColumn':  the name of a column of integral type that will be used for partitioning

`lowerBound':   the minimum value of 'partitionColumn' used to decide partition stride

`upperBound':   the maximum value of 'partitionColumn' used to decide partition stride

`numPartitions':    the number of partitions, This, along with 'lowerBound' (inclusive), 'upperBound' (exclusive), form partition strides for generated WHERE clause expressions used to split the column 'partitionColumn' evenly. This defaults to SparkContext.defaultParallelism when unset.

`predicates':   a list of conditions in the where clause; each one defines one partition

Data can be written to JDBC using the following code:

write.jdbc(x, url, tableName, mode = "error", ...)

Arguments

`x`: a SparkDataFrame.

`url`: JDBC database url of the form jdbc:subprotocol:subname.

`tableName`: yhe name of the table in the external database.

`mode`: one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default).

`...`: additional JDBC database connection properties.

JDBC Driver must be in spark classpath

Sign up to request clarification or add additional context in comments.

6 Comments

using jdbc gives me following error: Error: Error in jdbc : java.sql.SQLException: No suitable driver ERROR RBackendHandler: jdbc on 16 failed java.lang.reflect.InvocationTargetException
add postgresql jdbc driver in classpath. dataxone.com/import-export-postgresql-data-sparkr-dataframe
after adding the path successcully, i get the following error when reading the file: ```ERROR: operator does not exist: character varying = integer Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.````
so any idea how we can define the type cast explicitly in predicates?
@VasudhaJain: i guess you are trying to compare integer with varchar. check this ans stackoverflow.com/a/25358092/5019163
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.