2

I am trying to connect to a postgres database on the localhost:5432 of my computer using pyspark inside a docker container. For this I use VS code. VS code automatically builds and runs the container. This is the code I have:

password = ...
user = ...
url = 'jdbc:postgresql://127.0.0.1:5432/postgres'

    
    spark = SparkSession.builder.config("spark.jars","/opt/spark/jars/postgresql-42.2.5.jar") \
        .appName("PySpark_Postgres_test").getOrCreate()
        
    
df = connector.read.format("jbdc") \
.option("url", url) \
    .option("dbtable", 'chicago_crime') \
        .option("user", user) \
            .option("password", password) \
                .option("driver", "org.postgresql.Driver") \
                    .load()

I keep getting the same error:

"An error occurred while calling o358.load.\n: java.lang.ClassNotFoundException: \nFailed to find data source: jbdc. ...

Maybe the url is not correct?

url = 'jdbc:postgresql://127.0.0.1:5432/postgres'

The database is on port 5432 and has the name postgres. The database is on my localhost but since I am working in a docker container I assumed the correct way would be to enter the ip adress of your laptops localhost 127.0.0.1. If you type localhost it would refer to the localhost of your docker container. Or should I use the IPv4 Address (Wireless Lan .. or wsl).

Anyone knows what's wrong?

ps, one of the commands in my dockerfile is the following:

RUN wget https://jdbc.postgresql.org/download/postgresql-42.2.5.jar -P /opt/spark/jars
4
  • 1
    Can you show the Docker compose setup you are using to launch the database? Commented Nov 13, 2022 at 10:16
  • I am not using a Docker compose. Should I? To map the ports or something. Commented Nov 13, 2022 at 10:53
  • 2
    If you're not using docker-conpose, then can you share the docker run command you're using? Also, can you try once with host.docker.internal:5432, instead of 127.0.0.1:5432 and let us know the results Commented Nov 13, 2022 at 10:55
  • I forgot to say that I do the development in vs code. VS code builds and runs the container. Commented Nov 13, 2022 at 11:12

1 Answer 1

5
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.jars", "/opt/spark/jars/postgresql-42.2.5.jar") \
    .getOrCreate()
    
df = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://host.docker.internal:5432/postgres") \
    .option("dbtable", "chicago_crime") \
    .option("user", "postgres") \
    .option("password", "postgres") \
    .option("driver", "org.postgresql.Driver") \
    .load()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.